Compare two big arrays value for value in Node.js - arrays

I have two arrays, one containing 200.000 product objects coming from a CSV file and one containing 200.000 product objects coming from a database.
Both arrays contains objects with the same fields, with one exception: the database objects have a unique ID as well.
I need to compare all 200.000 CSV objects with the 200.000 database objects. If the CSV object already exists in the database objects array I put it in an "update" array together with the ID from the match, and if it doesn't, then I put it in a "new" array.
When done, I update all the "update" objects in the database, and insert all the "new" ones. This goes fast (few seconds).
The compare step however takes hours. I need to compare three values: the channel (string), date (date) and time (string). If all three are the same, it's a match. If one of those isn't, then it's not a match.
This is the code I have:
const newProducts = [];
const updateProducts = [];
csvProducts.forEach((csvProduct) => {
// check if there is a match
const match = dbProducts.find((dbProduct) => {
return dbProduct.channel === csvProduct.channel && moment(dbProduct.date).isSame(moment(csvProduct.date), 'day') && dbProduct.start_time === csvProduct.start_time;
});
if (match) {
// we found a match, add it to updateProducts array
updateProducts.push({
id: match.id,
...csvProduct
});
// remove the match from the dbProducts array to speed things up
_.pull(dbProducts, match);
} else {
// no match, it's a new product
newProducts.push(csvProduct);
}
});
I am using lodash and moment.js libraries.
The bottleneck is in the check if there is a match, any ideas on how to speed this up?

This is a job for the Map collection class. Arrays are a hassle because they must be searched linearly. Maps (and Sets) can be searched fast. You want to do your matching in RAM rather than hitting your db for every single object in your incoming file.
So, first read every record in your database and construct a Map where the keys are objects like this {start_time, date, channel} and the values are id. (I put the time first because I guess it's the attribute with the most different values. It's an attempt to make lookup faster.)
Something like this pseudocode.
const productsInDb = new Map()
for (const entry in database) {
const key = { // make your keys EXACTLY the same when you load your Map ..
start_time: entry.start_time,
date: moment(entry.date),
entry.channel}
productsInDb.add(key, entry.id)
}
This will take a whole mess of RAM, but so what? It's what RAM is for.
Then do your matching more or less the way you did it in your example, but using your Map.
const newProducts = [];
const updateProducts = [];
csvProducts.forEach((csvProduct) => {
// check if there is a match
const key = { // ...and when you look up entries in the Map.
start_time: entry.start_time,
date: moment(entry.date),
entry.channel}
const id = productsInDb.get(key)
if (id) {
// we found a match, add it to updateProducts array
updateProducts.push({
id: match.id,
...csvProduct
});
// don't bother to update your Map here
// unless you need to do something about dups in your csv file
} else {
// no match, it's a new product
newProducts.push(csvProduct)
}
});

Related

Find objects that include an array that contains all elements of a second array

I'm trying to filter a set of objects based on values in one of their elements based on another array. I've got it working with filter just fine if the search is "OR" - it returns give me all objects for which at least one of the strings in the search array is found.
But I can't figure out how to make it work as an AND search - returning only the objects that match ALL of the strings in the search array.
Example:
struct Schedule {
let title: String
let classTypes: [String]
}
let schedule1 = Schedule(title: "One", classTypes: ["math","english","chemistry","drama"])
let schedule2 = Schedule(title: "Two", classTypes: ["pe","math","biology"])
let schedule3 = Schedule(title: "Three", classTypes: ["english","history","math","art"])
let schedules = [schedule1, schedule2, schedule3]
let searchArray = ["math", "english"]
//works for OR - "math" or "english"
var filteredSchedules = schedules.filter { $0.classTypes.contains(where: { searchArray.contains($0) }) }
I'd like to find a way for it to use the same search array
let searchArray = ["math", "english"]
But only return items 1 & 3 - as they both have BOTH math and english in the list.
There are good examples of AND conditions when the AND is across different search criteria: car type and colour - but I've been unable to find an example where the criteria are dynamically based on items in an array. For context, I could have dozens of schedules with 20+ class types.
You can work with a Set, isSubset will return true if the schedules element contains all elements of the searchSet
let searchSet = Set(searchArray)
var filteredSchedules = schedules.filter { searchSet.isSubset(of: $0.classTypes) }
As suggested by #LeoDabus it might be worth changing the type of classTypes to Set instead of arrays (if order doesn't matter) since they seems to be unique and then the filtering can be done in the opposite way without the need to convert searchArray each time
var filteredSchedules = schedules.filter { $0.classTypes.isSuperset(of: searchArray) }

Node fast way to find in array

I have a Problem.
My script was working fine and fast, when there was only like up to 5000 Objects in my Array.
Now there over 20.000 Objects and it runs slower and slower...
This is how i called it
for(var h in ItemsCases) {
if(itmID == ItemsCases[h].sku) {
With "for" for every object and check where the sku is my itmID, cause i dont want every ItemsCases. Only few of it each time.
But what is the fastest and best way to get the items with the sku i need out of it?
I think mine, is not the fastest...
I get multiple items now with that code
var skus = res.response.cases[x].skus;
for(var j in skus) {
var itmID = skus[j];
for(var h in ItemsCases) {
if(itmID == ItemsCases[h].sku) {
the skus is also an array
ItemsCases.find(item => item.sku === itmID) (or a for loop like yours, depending on the implementation) is the fastest you can do with an array (if you can have multiple items returned, use filter instead of find).
Use a Map or an object lookup if you need to be faster than that. It does need preparation and memory, but if you are searching a lot it may well be worth it. For example, using a Map:
// preparation of the lookup
const ItemsCasesLookup = new Map();
ItemsCases.forEach(item => {
const list = ItemsCasesLookup.get(item.sku);
if (list) {
list.push(item)
} else {
ItemsCasesLookup.set(item.sku, [item]);
}
});
then later you can get all items for the same sku like this:
ItemsCasesLookup.get(itmID);
A compromise (not more memory, but some speedup) can be achieved by pre-sorting your array, then using a binary search on it, which is much faster than linear search you have to do on an unprepared array.

Splitting an array of objects based on unique combinations of two properties

For my time-tracking app, I'm storing time entries in an array of objects each looking like this:
{
date: "20181206",
hours: "4",
projectId: "65WchP9X46HlOYUzmWrL",
taskId: "fJTU7wggJHbg1uRuRUdHjS5mn8J3"
}
I need to group, into separate arrays, all objects by unique combinations of ProjectId - TaskId properties, so that each resulting array gets all entries for one particular project and one particular task (an array for project #1 / task #1, another for project #1 / task #2, another for project #2 / task #1, etc).
I know how to do that with one property, but is there a simple way to achieve the same with two properties?
Note: my global array is initially populated like so:
weekTimes = [];
querySnapshot.forEach((doc) => {
weekTimes.push(doc.data());
});
Create a composite key from projectId and taskId:
grouped = {}
for (weekTime of weekTimes) {
var key = `${weekTime.projectId}/${weekTime.taskId}`
if (grouped[key] === undefined) {
// if the key doesn't exist, weekTime is the first item, create the key
// and assign an array with the first item to it
grouped[key] = [ weekTime ]
} else {
// if the key already exists in grouped, just push new item to that key
grouped[key].push(weekTime)
}
}
// if you don't need the keys
Object.values(grouped)
Or use lodash groupby:
const _ = require('lodash')
_.groupBy(weekTimes, x => x.projectId + '/' + x.taskId)

NGRX - can't set the state tree as I would like it to be

So I'm using ngrx for managing the state in my application. I tried to add a new property (selected shifts) which should look like this:
state: {
shifts: {
selectedShifts: [
[employeeId]: [
[shiftId]: shift
]
]
}
}
at the moment, my state looks like this:
state: {
selectedShifts: {
[employeeId]: {
[shiftId]: shift
}
}
}
so as you can see, my "selected shift" is a property, not an array - which makes it diffictult to add/remove/query the state.
How do I compose the state to look like I want it?
This is what I tried in the reducer:
return {
...state,
selectedShifts: {
...state.selectedShifts,
[action.payload.employeeId]: {
...state.selectedShifts[action.payload.employeeId],
[action.payload.shiftId]: action.payload[shift.shiftId]
}
}
};
Now when I try to return the state in the way I'd like to, this is the result:
state: {
selectedShifts: {
[action.payload.employeeId]:
[0]: {[action.payload.shiftId]: { shift }}
}
}
What am I missing here? When I try to replace the {} items which should be [] this error comes up: "," expected.
Oh yea, I would like the index of the array to be the id of the specific shift and not [0], [1]...
Is this possible at all?
Would it be a bad idea to change the index from numerics to the actual shift's id?
Array length kind of miss behaves when you add data at numeric index points. This might get you into problems with array methods using length join, slice, indexOf etc. & array methods altering length push, splice, etc.
var fruits = [];
fruits.push('banana', 'apple', 'peach');
console.log(fruits.length); // 3
When setting a property on a JavaScript array when the property is a valid array index and that index is outside the current bounds of the array, the engine will update the array's length property accordingly:
fruits[5] = 'mango';
console.log(fruits[5]); // 'mango'
console.log(Object.keys(fruits)); // ['0', '1', '2', '5']
console.log(fruits.length); // 6
There is no problem selecting / updating state from object, it's just a bit different from what you're probably used to. With straight hashmap { objectId: Object } finding the required object to update / remove is the fastest possible if changes are defined for object id.
I know your problem is related to NGRX but reading Redux immutable patterns is going to definitely help you out here for add / update / remove objects from the state. https://redux.js.org/recipes/structuring-reducers/immutable-update-patterns
Generally you don't want to have arrays in state ( at least large arrays ) object hashmaps are a lot better.
To get array of your selected user shifts for views you could do something like. Note this is not a shift indexed array just array of shifts under userId property. From original state form following state.
state: {
selectedShifts: {
[employeeId]: {
[shiftId]: shift
}
}
}
const getSelectedShiftsAsArray = this.store.select( getSelectedShifts() )
.map(
userShifts => {
// get array of object ids
const userIds = Object.keys( userShifts );
const ret = {};
for( const userId of userIds ) {
const collectedShifts = [];
// convert Dictionary<Shift> into a Shift[]
// get array of shift ids
const shiftIds = Object.keys( userShifts[userId] );
// map array of shift ids into shift object array
collectedShifts = shiftIds.map( shiftId => userShifts[shiftId] );
// return value for a userId
ret[userId] = collectedShifts;
}
return ret;
});
Code is completely untested and just for a reference one level up from pseudocode. You could easily convert that into a NGRX selector though. The state is there just for the storage, how you model it for use in components is upto selector functions & components themselves.
If you really really need it you could add.
ret[userId].shiftIds = shiftIds;
ret[userId].shifts = collectedShifts;
But it really depends on how you plan to use these.
From my personal experience I would separate shift entities from selectedShifts but how you organise your state is completely up to you.
state: {
shifts: {
// contains shift entities as object property map id: entity
entities: Dictionary<Shift>,
selectedShifts: [
[employeeId]: number[] // contains ids for shifts
]
}
}
Now updating / removing and adding a shift would just be setting updated data into path shifts.entities[entityId]
Also selectedShifts for employeeId would be about checking if id is already in there and appending it into an array if it wasn't. ( If these arrays are humongous I'd go with object hash here too for fast access. <employeeId>: {shiftId:shiftId} ).
Check also:
redux: state as array of objects vs object keyed by id

Sorted array: how to get position before and after using name? as3

I have been working on a project and Stack Overflow has helped me with a few problems so far, so I am very thankful!
My question is this:
I have an array like this:
var records:Object = {};
var arr:Array = [
records["nh"] = { medinc:66303, statename:"New Hampshire"},
records["ct"] = { medinc:65958, statename:"Connecticut"},
records["nj"] = { medinc:65173, statename:"New Jersey"},
records["md"] = { medinc:64596, statename:"Maryland"},
etc... for all 50 states. And then I have the array sorted reverse numerically (descending) like this:
arr.sortOn("medinc", Array.NUMERIC);
arr.reverse();
Can I call the name of the record (i.e. "nj" for new jersey) and then get the value from the numeric position above and below the record in the array?
Basically, medinc is medium income of US states, and I am trying to show a ranking system... a user would click Texas for example, and it would show the medinc value for Texas, along with the state the ranks one position below and the state that ranks one position above in the array.
Thanks for your help!
If you know the object, you can use the array.indexOf().
var index:int = records.indexOf(records["nj"]);
var above:Object;
var below:Object;
if(index + 1 < records.length){ //make sure your not already at the top
above = records[index+1];
}
if(index > 0){ //make sure your not already at the bottom
below = records[index-1];
}
I think this is the answer based on my understanding of your data.
var index:int = arr.indexOf(records["nh"]);
That will get you the index of the record that was clicked on and then for find the ones below and above just:
var clickedRecord:Object = arr[index]
var higherRecord:Object = arr[index++]
var lowerRecord:Object = arr[index--]
Hope that answers your question
Do you really need records to be hash?
If no, you can simply move key to record field and change records to simple array:
var records: Array = new Array();
records.push({ short: "nh", medinc:66303, statename:"New Hampshire"}),
records.push({ short: "ct", medinc:65958, statename:"Connecticut"}),
....
This gives you opportunity to create class for State, change Array to Vector and make all of this type-safe, what is always good.
If you really need those keys, you can add objects like above (with "short" field) in the same way you are doing it now (maybe using some helper function which will help to avoid typing shortname twice, like addState(records, data) { records[data.short] = data }).
Finally, you can also keep those records in two objects (or an object and an array or whatever you need). This will not be expensive, if you will create state object once and keep references in array/object/vector. It would be nice idea if you need states sorted on different keys often.
This is not really a good way to have your data set up - too much typing (you are repeating "records", "medinc", "statename" over and over again, while you definitely could've avoided it, for example:
var records:Array = [];
var states:Array = ["nh", "ct", "nj" ... ];
var statenames:Array = ["New Hampshire", "Connecticut", "New Jersey" ... ];
var medincs:Array = [66303, 65958, 65173 ... ];
var hash:Object = { };
function addState(state:String, medinc:int, statename:String, hash:Object):Object
{
return hash[state] = { medinc: medinc, statename: statename };
}
for (var i:int; i < 50; i++)
{
records[i] = addState(states[i], medincs[i], statenames[i], hash);
}
While you have done it already the way you did, that's not essential, but this could've saved you some keystrokes, if you haven't...
Now, onto your search problem - first of all, true, it would be worth to sort the array before you search, but if you need to search an array by the value of the parameter it was sorted on, there is a better algorithm for that. That is, if given the data in your example, your specific task was to find out in what state the income is 65958, then, knowing that array is sorted on income you could employ binary search.
Now, for the example with 50 states the difference will not be noticeable, unless you do it some hundreds of thousands times per second, but in general, the binary search would be the way to go.
If the article in Wiki looks too long to read ;) the idea behind the binary search is that at first you guess that the searched value is exactly in the middle of the array - you try that assumption and if you guessed correct, return the index you just found, else - you select the interval containing the searched value (either one half of the array remaining) and do so until you either find the value, or check the same index - which would mean that the value is not found). This reduces asymptotic complexity of the algorithm from O(n) to O(log n).
Now, if your goal was to find the correspondence between the income and the state, but it wasn't important how that scales with other states (i.e. the index in the array is not important), you could have another hash table, where the income would be the key, and the state information object would be the value, using my example above:
function addState(state:String, medinc:int, statename:String,
hash:Object, incomeHash:Object):Object
{
return incomeHash[medinc] =
hash[state] = { medinc: medinc, statename: statename };
}
Then incomeHash[medinc] would give you the state by income in O(1) time.

Resources