I'm trying to limit the results of my Realm query. If I have a million records and I call Swift prefix function, does it touch all million records?
Here's what I'm trying to do:
let objects = realm.objects(BookRealmObject.self)
.sorted(byKeyPath: "createdAt", ascending: false)
let items: [BookType] = {
guard let limit = request.limit, limit > 0 else {
return objects.map { Book(from: $0) }
}
return objects.prefix(limit).map { Book(from: $0) }
}()
The type returned from prefix is Slice<Results<Element>>. Whether a limit is requested by the caller or not, I need to convert it to a plain object to pass to different threads.
Is this the proper way to handle this, or is there a more optimized, concise way to do this?
As we can find in docs:
Since queries in Realm are lazy, performing this sort of paginating behavior isn’t necessary at all, as Realm will only load objects from the results of the query once they are explicitly accessed.
So, when you get the prefix of objects it still should be lazy, but when you access objects using map you lose the lazy feature.
Related
I'd like to write a Flink streaming operator that maintains say 1500-2000 maps per key, with each map containing perhaps 100,000s of elements of ~100B. Most records will trigger inserts and reads, but I’d also like to support occasional fast iteration of entire nested maps.
I've written a KeyedProcessFunction that creates 1500 RocksDb-backed MapStates per key, and tested it by generating a stream of records with a single distinct key, but I find things perform poorly. Just initialising all of them takes on the order of several minutes, and once data begin to flow async incremental checkpoints frequently fail due to timeout. Is this is a reasonable approach? If not, what alternative(s) should I consider?
Thanks!
Functionally my code is along the lines of:
val stream = env.fromCollection(new Iterator[(Int, String)] with Serializable {
override def hasNext: Boolean = true
override def next(): (Int, String) = {
(1, randomString())
}
})
stream
.keyBy(_._1)
.process(new KPF())
.writeUsingOutputFormat(...)
class KFP extends KeyedProcessFunction[Int, (Int, String), String] {
var states: Array[MapState[Int, String]] = _
override def processElement(
value: (Int, String),
ctx: KeyedProcessFunction[Int, (Int, String), String]#Context,
out: Collector[String]
): Unit = {
if (states(0).isEmpty) {
// insert 0-300,000 random strings <= 100B
}
val state = states(random.nextInt(1500))
// Read from R random keys in state
// Write to W random keys state
// With probability 0.01 iterate entire contents of state
if (random.nextInt(100) == 0) {
state.iterator().forEachRemaining {
// do something trivial
}
}
}
override def open(parameters: Configuration): Unit = {
states = (0 until 1500).map { stateId =>
getRuntimeContext.getMapState(new MapStateDescriptor[Int, String](stateId.toString, classOf[Int], classOf[String]))
}.toArray
}
}
There's nothing in what you've described that's an obvious explanation for poor performance. You are already doing the most important thing, which is to use MapState<K, V> rather than ValueState<Map<K, V>>. This way each key/value pair in the map is a separate RocksDB object, rather than the entire Map being one RocksDB object that has to go through ser/de for every access/update for any of its entries.
To understand the performance better, the next step might be to enable the RocksDB native metrics, and study those for clues. RocksDB is quite tunable, and better performance may be achievable. E.g., you can tune for your expected mix of read and writes, and if you are trying to access keys that don't exist, then you should enable bloom filters (which are turned off by default).
The RocksDB state backend has to go through ser/de for every state access/update, which is certainly expensive. You should consider whether you can optimize the serializer; some serializers can be 2-5x faster than others. (Some benchmarks.)
Also, you may want to investigate the new spillable heap state backend that is being developed. See https://flink-packages.org/packages/spillable-state-backend-for-flink, https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend, and https://issues.apache.org/jira/browse/FLINK-12692. Early benchmarking suggest this state backend is significantly faster than RocksDB, as it keeps its working state as objects on the heap, and spills cold objects to disk. (How much this would help probably depends on how often you have to iterate.)
And if you don't need to spill to disk, the the FsStateBackend would be faster still.
I have a database (parse-server) from which I can fetch objects which contain information. Some of the information in the properties of the objects are used to populate labels on table views. The way I have been populating, let's say, the userName and userLike labels are as follows:
Appending Different Arrays with the objects properties
var userName = [String]()
var userLikes = [String]()
func query(){
let commentsQuery = PFQuery(className: "UserStuff")
commentsQuery.findObjectsInBackground { (objectss, error) in
if let objects = objectss{
for object in objects{
self.userName.append(object["userName"] as! String)
self.userLikes.append(object["userLikes"] as! String)
}
}
}
}
Ignore the fact that I don't have a .whereKey or any else statements to handle other cases... this is bare bones just for illustration of the question. Anyway, in this method, the userName and userLikes arrays are iterated through to populate the labels. The for object in objectss{} ensures that the indexes in one array (whether index 0,1,2,3,etc...) refers to/comes from the same object as the value in the index of the other array. However, I was wondering if would be better to do it as follows:
Appending the whole object to a PFObject array
var userObjects = [PFObject]()
func query(){
let commentsQuery = PFQuery(className: "UserStuff")
commentsQuery.findObjectsInBackground { (objectss, error) in
if let objects = objectss{
for object in objects{
self.userName.append(object)
}
}
}
}
With this method I could instead populate the labels with something like:
userNameLabel.text = String((userObjects[0])["userName"])
In this method all properties of the object would be accessible form the same array. I can see that this may have some advantages, but is this definitively the better way to do it/should I switch immediately?
I am going to say that the answer is that the latter of the two is probably the better method. This is because in the former, the information from a particular object is only linked between arrays by the order in the array. Any accidental or incorrectly scripted functions involving .append or .remove could skew the order between arrays and then an object's name might be the 3rd index in the nameArray but its likes may end up being the 4th index in the likesArray and it would be difficult to amend this issue. With the latter method, all information regarding an object's properties are linked to the object itself in the array and this issue is avoided.
I am new in CoreData and I am trying to fetch only one column Data. I am trying using below code:
//MARK :- Fetch All Calculation
func fetchUniqueParentAxis(testID : String) -> [String] {
var arrAxis : [String] = []
let fetchRequest = NSFetchRequest<NSFetchRequestResult>(entityName: "TesCalculation")
fetchRequest.predicate = NSPredicate(format: "testID == %# ",testID)
fetchRequest.includesPropertyValues = true
fetchRequest.returnsObjectsAsFaults = false
fetchRequest.propertiesToFetch = ["parentAxis"]
do {
calculation = try AppDelegate.getContext().fetch(fetchRequest) as! [TesCalculation]//try AppDelegate.getContext().fetch(fetchRequest) as! [String : AnyObject]
}
catch let error {
//Handle Error
Helper.sharedInstance.Print(error as AnyObject)
}
}.
"parentAxis" is my a column and I want to fetch data of that column only .
You can absolutely fetch a subset of columns in CoreData. Yes, I agree Core Data IS an Object Graph solution that can use SQLite as a storage engine. However, fetching a subset of data instead of an entire NSManagedObject has been possible to do for a very long time in CoreData but recent changes have made it slightly different than before.
let fetchRequest = NSFetchRequest<NSDictionary>(entityName: TesCalculation.entity().name!)
fetchRequest.resultType = .dictionaryResultType
fetchRequest.predicate = NSPredicate(format: "testID == %# ",testID)
fetchRequest.propertiesToFetch = ["parentAxis"]
do {
let results = try AppDelegate.getContext().fetch(fetchRequest)
} catch let error as NSError {
print(error)
}
What this will return you is something that looks like this:
[
{
"parentAxis": "abc"
},
{
"parentAxis": "xyz"
}
]
The question was not about performance gain (even though there might be; I truly have no idea!) but rather if/how this can be done and this is how it can be done. Plus, I disagree with other statements made that it "doesn't make sense to have a property without an object." There are plenty of cases where you are allocating objects where all you need is a property or two for the need. This also comes in handy if you are loosely coupling Entities between multiple stores. Of course there are other methods for this as well (your xcdatamodel's Fetched Properties) so it just really depends on the use case.
Just to show that it has been around for some time this is in the header of the NSFetchRequest under NSFetchRequestResultType:
#available(iOS 3.0, *)
public static var dictionaryResultType: NSFetchRequestResultType { get }
Core data is an object model. It translates rows of the sql database into objects with properties. You are not running queries directly on the SQL, so you cannot do everything in core-data that you could do with SQL. You interact with object which interact with the database. In core-data you can have an object that is not faulted. It means that none of its properties are loaded into memory, but when you need them (when you access a property) it will fetch it from the database. You cannot have a object that has only some of its properties set. They are either all set (ie faulted) or none are set (it is not faulted).
There really isn't much to be gained in most cased by only fetch one or two columns. The extra data transfer is often minimal. The only exception is when you have a large blob of data as as property on the object. In that case you should store the blob on a separate entity and give it a one-to-one relationship. That way the expensive block of data isn't loaded into memory until it is requested.
Question
How can I create an array of objects containing Realm objects?
Code
let realm = try! Realm()
let data: [A] = realm.objects(A)
Error
Cannot invoke 'objects' with an argument list of type '(Object.type)'
How can I create an array of objects containing Realm objects?
From your code sample, I'll further assume that you want to make an array from a Realm Results, not just "standalone" Realm objects.
Since Results conforms to SequenceType, you can use SequenceType.map() to convert it into an array:
let arrayFromResults = results.map({ $0 })
Note, however, that this is almost always the wrong pattern to use.
From your tweet on the same topic, a preferable way to do this would be to encode what you want to display on screen as a Realm query:
self.results = realm.objects(A).filter("poppedOff == NO")
And "popping off" an object (whatever that means) would update the poppedOff property of that object.
Since Realm Results are auto-updating, this won't risk getting out of sync with the contents of the Realm, unlike the array conversion approach, which would have to be updated on every Realm change notification.
I am trying to query from parse.com and I would db receiving about 100 objects per time. I used the swift example code on their website, and the app doesn't build with that code. So I looked around and found that people were using code similar to this:
var query = PFQuery(className:"posts")
query.whereKey("post", equalTo: "true")
query.findObjectsInBackgroundWithBlock({ (objects: [AnyObject]?, error: NSError?) -> Void in
// do something
self.myDataArray = objects as! [String]
})
This does not work, because I am trying to convert PFObject to String
I would need to get the one one value from each object into a swift string array [String]. How do I get just the one text value, instead of the PFObject and how do I get it into the swift string array?
I don't speak swift very well, but the problem with the code is it's trying to cast the returned PFObject to a string, but you want to extract a string attribute, so (if you really want to do it):
for object in objects {
var someString = object.valueForKey("someAttributeName") as String
self.myDataArray.addObject(someString)
}
But please make sure you need to do this. I've noticed a lot of new parse/swift users (especially those who are populating tables) have the urge to discard the returned PFObjects in favor of just one of their attributes. Consider keeping the PFObjects and extracting the attributes later as you need them. You might find you'll need other attributes, too.
For starters, I would definitely recommend using the "if let" pattern to qualify your incoming data. This is a nice Swift feature that will help avoid run-time errors.
var query = PFQuery(className:"posts")
query.whereKey("post", equalTo: "true")
query.findObjectsInBackgroundWithBlock(
{ (objects: [AnyObject]?, error: NSError?) -> Void in
// check your incoming data and try to cast to array of "posts" objects.
if let foundPosts = objects as? [posts]
{
// iterate over posts and try to extract the attribute you're after
for post in foundPosts
{
// this won't crash if the value is nil
if let foundString = post.objectForKey("keyForStringYouWant") as? String
{
// found a good data value and was able to cast to string, add it to your array!
self.myDataArray.addObject(foundString)
}
}
})