Counting array elements in Ruby (unexpected results by the count( ) function) - arrays

In my understanding the following ruby expressions should produce the same result.
Apparently I am missing something, this is a way too serious bug to go unnoticed...
# returns the number of ALL elements in the array
count = #quotation.quotation_items.count { |x| x.placement == current_placement}
# Does what I expect
count = (#quotation.quotation_items.find_all { |x| x.placement == current_placement }).length
quotation_items above is an ActiveRecord has_many association

#count does not take a block like that.
If you want to use conditions on a count, you would do:
#quotation.quotation_items.count(:conditions => "placement = #{current_placement}")
http://apidock.com/rails/ActiveRecord/Calculations/ClassMethods/count

If you're using ActiveRecord you need to keep in mind that there's a point where it's compiling conditions and clauses for a query and a point where you have a result set. Certain things only work in one mode or the other, though it tries to keep things pretty consistent regardless.
In your case you are expecting count to work like Enumerable, but that's still a database-level operator.
To fix that:
#quotation.quotation_items.where(placement: current_placement).count
That composes a query that counts only the things you need, something approximating:
SELECT COUNT(*) FROM quotation_items WHERE quotation_id=? AND placement=?
That's something that yields a single number and is considerably different than selecting every record, instantiating into models, then counting those using Enumerable.

Your usage of #count is incorrect.
I believe it doesn't accept a block. I'm not sure why it didn't return an error though.
you can use it like this :
count = #quotation.quotation_items.map{ |x| x.placement == current_placement}.count(true)

Related

How do I select for a row using multiple clauses in PACT smart contract language

The PACT documentation clearly states how to select for a single condition in the where clause, but it is not so clear on how to select for multiple clauses which seems much more general and important for real world use cases than a single clause example.
Pact-lang select row function link
For instance I was trying to select a set of dice throws across the room name and the current round.
(select 'throws (where (and ('room "somename") ('round 2)))
But this guy didn't resolve and the error was not so clear. How do I select across multiple conditions in the select function?
The first thing we tried was to simply select via a single clause which returns a list:
(select 'throws (where (and ('room "somename")))
A: [object{throw-schema},object{throw-schema}]
And then we applied the list operator "filter" to the result:
(filter (= 'with-read-function' 1) (select 'throws (where (and ('room "somename"))))
Please keep in mind we had a further function that read the round and spit back the round number and we filtered for equality of the round value.
This ended up working but it felt very janky.
The second thing we tried was to play around with the and syntax and we eventually found a nice way to express it although not quite so intuitive as we would have liked. It just took a little elbow grease.
The general syntax is:
(select 'throws (and? (where condition1...) (where condition2...))
In this case the and clause was lazy, hence the ? operator. We didn't think that we would have to declare where twice, but its much cleaner than the filter method we first tried.
The third thing we tried via direction from the Kadena team was a function we had yet to purview: Fold DB.
(let* ((qry (lambda (k obj) true)) ;;
(f (lambda(x) [(at 'firstName x), (at 'b x)])) ) (fold-db people (qry) (f)) )
This actually is the most correct answer but it was not obvious from the initial scan and would be near inscrutable for a new user to put together with no pact experience.
We suggest a simple sentence ->
"For multiple conditions use fold-db function."
In the documentation.
This fooled us because we are so used to using SQL syntax that we didn't imagine that there was a nice function like this lying around and we got stuck in our ways trying to figure out conditional logic.

Filter Array For IDs Existing in Another Array with Ruby on Rails/Mongo

I need to compare the 2 arrays declared here to return records that exist only in the filtered_apps array. I am using the contents of previous_apps array to see if an ID in the record exists in filtered_apps array. I will be outputting the results to a CSV and displaying records that exist in both arrays to the console.
My question is this: How do I get the records that only exist in filtered_apps? Easiest for me would be to put those unique records into a new array to work with on the csv.
start_date = Date.parse("2022-02-05")
end_date = Date.parse("2022-05-17")
valid_year = start_date.year
dupe_apps = []
uniq_apps = []
# Finding applications that meet my criteria:
filtered_apps = FinancialAssistance::Application.where(
:is_requesting_info_in_mail => true,
:aasm_state => "determined",
:submitted_at => {
"$exists" => true,
"$gte" => start_date,
"$lte" => end_date })
# Finding applications that I want to compare against filtered_apps
previous_apps = FinancialAssistance::Application.where(
is_requesting_info_in_mail: true,
:submitted_at => {
"$exists" => true,
"$gte" => valid_year })
# I'm using this to pull the ID that I'm using for comparison just to make the comparison lighter by only storing the family_id
previous_apps.each do |y|
previous_apps_array << y.family_id
end
# This is where I'm doing my comparison and it is not working.
filtered_apps.each do |app|
if app.family_id.in?(previous_apps_array) == false
then #non_dupe_apps << app
else "No duplicate found for application #{app.hbx_id}"
end
end
end
So what am I doing wrong in the last code section?
Let's check your original method first (I fixed the indentation to make it clearer). There's quite a few issues with it:
filtered_apps.each do |app|
if app.family_id.in?(previous_apps_array) == false
# Where is "#non_dupe_apps" declared? It isn't anywhere in your example...
# Also, "then" is not necessary unless you want a one-line if-statement
then #non_dupe_apps << app
# This doesn't do anything, it's just a string
# You need to use "p" or "puts" to output something to the console
# Note that the "else" is also only triggered when duplicates WERE found...
else "No duplicate found for application #{app.hbx_id}"
end # Extra "end" here, this will mess things up
end
end
Also, you haven't declared previous_apps_array anywhere in your example, you just start adding to it out of nowhere.
Getting the difference between 2 arrays is dead easy in Ruby: just use -!
uniq_apps = filtered_apps - previous_apps
You can also do this with ActiveRecord results, since they are just arrays of ActiveRecord objects. However, this doesn't help if you specifically need to compare results using the family_id column.
TIP: Getting the values of only a specific column/columns from your database is probably best done with the pluck or select method if you don't need to store any other data about those objects. With pluck, you only get an array of values in the result, not the full objects. select works a bit differently and returns ActiveRecord objects, but filters out everything but the selected columns. select is usually better in nested queries, since it doesn't trigger a separate query when used as a part of another query, while pluck always triggers one.
# Querying straight from the database
# This is what I would recommend, but it doesn't print the values of duplicates
uniq_apps = filtered_apps.where.not(family_id: previous_apps.select(:family_id))
I highly recommend getting really familiar with at least filter/select, and map out of the basic array methods. They make things like this way easier. The Ruby docs are a great place to learn about them and others. A very simple example of doing a similar thing to what you explained in your question with filter/select on 2 arrays would be something like this:
arr = [1, 2, 3]
full_arr = [1, 2, 3, 4, 5]
unique_numbers = full_arr.filter do |num|
if arr.include?(num)
puts "Duplicates were found for #{num}"
false
else
true
end
end
# Duplicates were found for 1
# Duplicates were found for 2
# Duplicates were found for 3
=> [4, 5]
NOTE: The OP is working with ruby 2.5.9, where filter is not yet available as an array method (it was introduced in 2.6.3). However, filter is just an alias for select, which can be found on earlier versions of Ruby, so they can be used interchangeably. Personally, I prefer using filter because, as seen above, select is already used in other methods, and filter is also the more common term in other programming languages I usually work with. Of course when both are available, it doesn't really matter which one you use, as long as you keep it consistent.
EDIT: My last answer did, in fact, not work.
Here is the code all nice and working.
It turns out the issue was that when comparing family_id from the set of records I forgot that the looped record was a part of the set, so it would return it, too. I added a check for the ID of the array to match the looped record and bob's your uncle.
I added the pass and reject arrays so I could check my work instead of downloading a csv every time. Leaving them in mostly because I'm scared to change anything else.
start_date = Date.parse(date_from)
end_date = Date.parse(date_to)
valid_year = start_date.year
date_range = (start_date)..(end_date)
comparison_apps = FinancialAssistance::Application.by_year(start_date.year).where(
aasm_state:'determined',
is_requesting_voter_registration_application_in_mail:true)
apps = FinancialAssistance::Application.where(
:is_requesting_voter_registration_application_in_mail => true,
:submitted_at => date_range).uniq{ |n| n.family_id}
#pass_array = []
#reject_array = []
apps.each do |app|
family = app.family
app_id = app.id
previous_apps = comparison_apps.where(family_id:family.id,:id.ne => app.id)
if previous_apps.count > 0
#reject_array << app
puts "\e[32mApplicant hbx id \e[31m#{app.primary_applicant.person_hbx_id}\e[32m in family ID \e[31m#{family.id}\e[32m has registered to vote in a previous application.\e[0m"
else
<csv fields here>
csv << [csv fields here]
end
end
Basically, I pulled the applications into the app variable array, then filtered them by the family_id field in each record.
I had to do this because the issue at the bottom of everything was that there were records present in app that were themselves duplicates, only submitted a few days apart. Since I went on the assumption that the initial app array would be all unique, I thought the duplicates that were included were due to the rest of the code not filtering correctly.
I then use the uniq_apps array to filter through and look for matches in uniq_apps.each do, and when it finds a duplicate, it adds it to the previous_applications array inside the loop. Since this array resets each go-round, if it ever has more than 0 records in it, the app gets called out as being submitted already. Otherwise, it goes to my csv report.
Thanks for the help on this, it really got my brain thinking in another direction that I needed to. It also helped improve the code even though the issue was at the very beginning.

How to detect concurrent dates

I need an algorithm which looks simple, but I still can't think about a well optimised way to do to do this.
I have the following json object:
[
{
"start": "2000-01-01T04:00:00.000Z",
"end": "2020-01-01T08:00:00.000Z"
}, {
"start": "2000-01-01T05:00:00.000Z",
"end": "2020-01-01T07:00:00.000Z"
}
]
As you can see, the second object is inside the range of the first. I need to iterate over this array and return which dates are conflicting.
My project is in ruby on rails right now, but I just need an idea how to implement the algorithm so, any high level programming language would be good.
Any ideas?
First, we can transform the list of hashes to parse the dates into Date objects:
require 'date'
dates = input.map do |hsh|
hsh.transform_values { |str| Date.parse str }
end
Now we can use a nested loop and use Range#cover? to find if there are duplicates:
conflicting = dates.select.with_index do |date, idx|
[date[:start], date[:end]].any? do |date_to_compare|
dates.map.with_index.any? do |date2, idx2|
next if idx == idx2 # so we don't compare to self
(date2[:start]..date2[:end]).cover?(date_to_compare)
end
end
end
Detect a DateTime Object Covered By a Range
There may be a more elegant way to do this, but this seems relatively straightforward to me. The trick is to convert your Hash values into DateTime ranges that can take advantage of the built-in Range#cover? method.
Consider the following:
require 'date'
dates = [
{:start=>"2000-01-01T04:00:00.000Z", :end=>"2020-01-01T08:00:00.000Z"},
{:start=>"2000-01-01T05:00:00.000Z", :end=>"2020-01-01T07:00:00.000Z"},
]
# convert your date hashes into an array of date ranges
date_ranges = dates.map { |hash| hash.values}.map do |array|
(DateTime.parse(array.first) .. DateTime.parse(array.last))
end
# compare sets of dates; report when the first covers the second range
date_ranges.each_slice(2) do |range1, range2|
puts "#{range1} covers #{range2}" if range1.cover? range2
end
Because Range#cover? is Boolean, you might prefer to simply store dates which are covered and do something with them later, rather than taking immediate action on each one. In that case, just use Array#select. For example:
date_ranges.each_slice(2).select { |r1, r2| r1.cover? r2 }
Shove the data into a database using BTREE index on the date fields. Let the DB do the work for you.
Lets say we have the following table:
TABLE myDate {
id BIGINT UNSIGNED, date_start DATETIME, date_end DATETIME
}
Then you want BTREE (or BTREE+) index on date_start and date_end, and HASH index on id.
Once these are in place, feed your table the data, and perform the following select statement to find times that overlap:
-- Query to select dates that are fully contained such as in the example (l contains r):
SELECT l.id, l.date_start, l.date_end, r.id, r.date_start, r.date_end
FROM myDate l JOIN myDate r ON (l.date_start < r.date_start) AND (l.date_end > r.date_end);
-- Query to select dates that overlap on one side:
SELECT l.id, l.date_start, l.date_end, r.id, r.date_start, r.date_end
FROM myDate l JOIN myDate r ON ((l.date_start < r.date_start) AND (l.date_end > r.date_start)) OR ((l.date_start > r.date_start) AND (l.date_end < r.date_start));
Those strings look like ISO 8601 format. You should be able to easily parse that into a Date/DateTime/orsimilar object. Check the docs about those classes, it will be shown there show you cn do that. Then, after parsing into objects, you should be able to compare those date objects simply with </<=/>=/> operators. With this you will be able to compare starts/ends, and you will be able to determine if a date X is:
(a) fully before the other one
(b) startsbefore and ends within the other one
(c) fully within the other one
(d) startswithin and ends after the other one
(e) fully after the other one
(f) is longer and fully contains the other one
I think that's all possibilities, but you better double-check that. Draw them all on time axis if needed and see if there are any other possibilities.
When you have code that can do this classification, you're good to go and implement rest of the logic that bases on that.
but I still can't think about a well optimised way
don't. Write it first in any way, just to get it working and reliable. Understand the problem from the beginning to the end, thoroughly. Then measure its speed and quality. If it's not good, then write a v2 version based on a first-whatever-guess regarding speed/quality observations. Measure and compare. If it's still not good, then collect code, data sets, measurements, make sure test cases and measurements are repeatable by readers that don't have your computer&network&passwords&etc, and then explain the problem and about how to fix/optimize that. Without all of this, asking about "optimization"*) mostly leads to pure guessing.
*) OFC assuming that "well optimized way" wasn't an empty buzzword, but a real question re performance

Why properties referenced in an equality (EQUAL) or membership (IN) filter cannot be projected?

https://developers.google.com/appengine/docs/java/datastore/projectionqueries
Why a projected query such as this : SELECT A FROM kind WHERE A = 1 not supported ?
Because it makes no sense. You are asking
SELECT A FROM kind WHERE A = 1
so, give me A where A = 1. Well, you already know that A = 1. It makes no sense for DB to allow that.
The IN query is internally just a series of equals queries merged together, so the same logic applies to it.
The reasoning behind this could be that since you already have the values of the properties you are querying you don't need them returned by the query. This is probably a good thing in the long run, but honestly, it's something that App Engine should allow anyway. Even if it didn't actually fetch these values from the datastore, it should add them to the entities returned to you behind the scenes so you can go about your business.
Anyway, here's what you can do...
query = MyModel.query().filter(MyModel.prop1 == 'value1', MyModel.prop2 == 'value2)
results = query.fetch(projection=[MyModel.prop3])
for r in results:
r.prop1 = 'value1' # the value you KNOW is correct
r.prop2 = 'value2'
Again, would be nice for this to happen behind the scenes because I don't think it's something anybody should ever care about. If I mention a property in a projection list, I'm already stating that I want that property as part of my entities. I shouldn't have to do any more computation to get that to happen.
On the other hand, it's just an extra for-loop. :)

Google App Engine - Delete until count() <= 0

What is the difference between these 2 pieces of code?
query=Location.all(keys_only=True)
while query.count()>0:
db.delete(query.fetch(5))
# --
while True:
query=Location.all(keys_only=True)
if not query.count():
break
db.delete(query.fetch(5))
They both work.
Logically, these two pieces of code perform the same exact thing - they delete every Location entity, 5 at a time.
The first piece of code is better both in terms of style and (slightly) in terms of performance. (The query itself does not need to be rebuilt in each loop).
However, this code is not as efficient as it could be. It has several problems:
You use count() but do not need to. It would be more efficient to simply fetch the entities, and then test the results to see if you got any.
You are making more round-trips to the datastore than you need to. Each count(), fetch(), and delete() call must go the datastore and back. These round-trips are slow, so you should try to minimize them. You can do this by fetching more entities in each loop.
Example:
q = Location.all(keys_only=True)
results = q.fetch(500)
while results:
db.delete(results)
results = q.fetch(500)
Edit: Have a look at Nick's answer below - he explains why this code's performance can be improved even more by using query cursors.
Here's a solution that's neater, but you may or may not consider to be a hack:
q = Location.all(keys_only=True)
for batch in iter(lambda: q.fetch(500), []):
db.delete(batch)
One gotcha, however, is that as you delete more and more, the backend is forced to skip over the 'tombstoned' entities to find the next ones that aren't deleted. Here's a more efficient solution that uses cursors:
q = Location.all(keys_only=True)
results = q.fetch(500)
while results:
db.delete(results)
q = Location.all(keys_only=True).with_cursor(q.cursor())
results = q.fetch(500)
In the second one, query will be assigned/updated in every loop. I don't know if this is needed with the logic behind it (I don't use google app engine). To replicate this behaviour, the first one would have to look like this:
query=Location.all(keys_only=True)
while query.count()>0:
db.delete(query.fetch(5))
query=Location.all(keys_only=True)
In my oppinion, the first style is way more readable than the second one.
It's too bad you can't do this in python.
query=Location.all(keys_only=True)
while locations=query.fetch(5):
db.delete(locations)
Like in the other P language
while(#row=$sth->fetchrow_array){
do_something();
}

Resources