Node fast way to find in array - arrays

I have a Problem.
My script was working fine and fast, when there was only like up to 5000 Objects in my Array.
Now there over 20.000 Objects and it runs slower and slower...
This is how i called it
for(var h in ItemsCases) {
if(itmID == ItemsCases[h].sku) {
With "for" for every object and check where the sku is my itmID, cause i dont want every ItemsCases. Only few of it each time.
But what is the fastest and best way to get the items with the sku i need out of it?
I think mine, is not the fastest...
I get multiple items now with that code
var skus = res.response.cases[x].skus;
for(var j in skus) {
var itmID = skus[j];
for(var h in ItemsCases) {
if(itmID == ItemsCases[h].sku) {
the skus is also an array

ItemsCases.find(item => item.sku === itmID) (or a for loop like yours, depending on the implementation) is the fastest you can do with an array (if you can have multiple items returned, use filter instead of find).
Use a Map or an object lookup if you need to be faster than that. It does need preparation and memory, but if you are searching a lot it may well be worth it. For example, using a Map:
// preparation of the lookup
const ItemsCasesLookup = new Map();
ItemsCases.forEach(item => {
const list = ItemsCasesLookup.get(item.sku);
if (list) {
list.push(item)
} else {
ItemsCasesLookup.set(item.sku, [item]);
}
});
then later you can get all items for the same sku like this:
ItemsCasesLookup.get(itmID);
A compromise (not more memory, but some speedup) can be achieved by pre-sorting your array, then using a binary search on it, which is much faster than linear search you have to do on an unprepared array.

Related

Compare two big arrays value for value in Node.js

I have two arrays, one containing 200.000 product objects coming from a CSV file and one containing 200.000 product objects coming from a database.
Both arrays contains objects with the same fields, with one exception: the database objects have a unique ID as well.
I need to compare all 200.000 CSV objects with the 200.000 database objects. If the CSV object already exists in the database objects array I put it in an "update" array together with the ID from the match, and if it doesn't, then I put it in a "new" array.
When done, I update all the "update" objects in the database, and insert all the "new" ones. This goes fast (few seconds).
The compare step however takes hours. I need to compare three values: the channel (string), date (date) and time (string). If all three are the same, it's a match. If one of those isn't, then it's not a match.
This is the code I have:
const newProducts = [];
const updateProducts = [];
csvProducts.forEach((csvProduct) => {
// check if there is a match
const match = dbProducts.find((dbProduct) => {
return dbProduct.channel === csvProduct.channel && moment(dbProduct.date).isSame(moment(csvProduct.date), 'day') && dbProduct.start_time === csvProduct.start_time;
});
if (match) {
// we found a match, add it to updateProducts array
updateProducts.push({
id: match.id,
...csvProduct
});
// remove the match from the dbProducts array to speed things up
_.pull(dbProducts, match);
} else {
// no match, it's a new product
newProducts.push(csvProduct);
}
});
I am using lodash and moment.js libraries.
The bottleneck is in the check if there is a match, any ideas on how to speed this up?
This is a job for the Map collection class. Arrays are a hassle because they must be searched linearly. Maps (and Sets) can be searched fast. You want to do your matching in RAM rather than hitting your db for every single object in your incoming file.
So, first read every record in your database and construct a Map where the keys are objects like this {start_time, date, channel} and the values are id. (I put the time first because I guess it's the attribute with the most different values. It's an attempt to make lookup faster.)
Something like this pseudocode.
const productsInDb = new Map()
for (const entry in database) {
const key = { // make your keys EXACTLY the same when you load your Map ..
start_time: entry.start_time,
date: moment(entry.date),
entry.channel}
productsInDb.add(key, entry.id)
}
This will take a whole mess of RAM, but so what? It's what RAM is for.
Then do your matching more or less the way you did it in your example, but using your Map.
const newProducts = [];
const updateProducts = [];
csvProducts.forEach((csvProduct) => {
// check if there is a match
const key = { // ...and when you look up entries in the Map.
start_time: entry.start_time,
date: moment(entry.date),
entry.channel}
const id = productsInDb.get(key)
if (id) {
// we found a match, add it to updateProducts array
updateProducts.push({
id: match.id,
...csvProduct
});
// don't bother to update your Map here
// unless you need to do something about dups in your csv file
} else {
// no match, it's a new product
newProducts.push(csvProduct)
}
});

AppScript: 'number of columns in the data does not match the number of columns in the range.' setValues method not reading array correctly?

I'm trying to automate the collection of phone numbers from an API into a Google Sheet with app script. I can get the data and place it in an array with the following code:
const options = {
method: 'GET',
headers: {
Authorization: 'Bearer XXXXXXXXXXXXXXX',
Accept: 'Application/JSON',
}
};
var serviceUrl = "dummyurl.com/?params";
var data=UrlFetchApp.fetch(serviceUrl, options);
if(data.getResponseCode() == 200) {
var response = JSON.parse(data.getContentText());
if (response !== null){
var keys = Object.keys(response.call).length;
var phoneArray = [];
for(i = 0; i < keys; i++) {
phoneArray.push(response.call[i].caller.caller_id);
}
This works as expected - it grabs yesterday's caller ID values from a particular marketing campaign from my API. Next, I want to import this data into a column in my spreadsheet. To do this, I use the setValues method like so:
Logger.log(phoneArray);
var arrayWrapper = [];
arrayWrapper.push(phoneArray);
Logger.log(arrayWrapper);
for(i = 0; i < keys; i++) {
var sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
var cell = sheet.getRange("A8");
cell.setValues(arrayWrapper);
}
}
}
}
I am aware that I need my array length to equal the length of the selected range of cells in my sheet. However, I get conflicting errors depending on the length I set for my getRange method. If I set it to a single cell, as you see above, the error I get is:
The number of columns in the data does not match the number of columns in the range. The data has 8 but the range has 1.
However, if I set the length of my range to 8 (or any value except 1), I get the error:
The number of columns in the data does not match the number of columns in the range. The data has 1 but the range has 8.
As you see, the error swaps values. Now I have the appropriate number of columns in the range, but my script only finds 1 cell of data. When I check the log, I see that my 2D array looks normal in both cases - 8 phone numbers in an array wrapped in another array.
What is causing this error? I cannot find reference to similar errors on SO or elsewhere.
Also, please note that I'm aware this code is a little wonky (weird variables and two for loops where one would do). I've been troubleshooting this for a couple hours and was originally using setValue instead of setValues. While trying to debug it, things got split up and moved around a lot.
The dimension of your range is one row and several columns
If you push an array into another array, the dimension will be [[...],[...],[...]] - i.e. you have one column and multiple rows
What you want instead is one row and multiple columns: [[...,...,...]]
To achieve this you need to create a two-dimensional array and push all entries into the first row of your array: phoneArray[0]=[]; phoneArray[0].push(...);
Sample:
var phoneArray = [];
phoneArray[0]=[];
for(i = 0; i < keys; i++) {
var phoneNumber = response.call[i].caller.caller_id;
phoneNumber = phoneNumber.replace(/-/g,'');
phoneArray[0].push(phoneNumber);
}
var range = sheet.getRange(1,8,1, keys);
range.setValues(phoneArray);
So I figured out how to make this work, though I can't speak to why the error is occurring, or rather why one receives reversed error messages depending on the setRange value.
Rather than pushing the whole list of values from the API to phoneArray, I structured my first for loop to reset the value of phoneArray each loop and push a single value array to my arrayWrapper, like so:
for(i = 0; i < keys; i++) {
var phoneArray = [];
var phoneNumber = response.call[i].caller.caller_id;
phoneNumber = phoneNumber.replace(/-/g,'');
phoneArray.push(phoneNumber);
arrayWrapper.push(phoneArray);
}
Note that I also edited the formatting of the phone numbers to suit my needs, so I pulled each value into a variable to make replacing a character simple. What this new for loop results in is a 2D array like so:
[[1235556789],[0987776543],[0009872345]]
Rather than what I had before, which was like this:
[[1235556789,0987776543,0009872345]]
It would appear that this is how the setValues method wants its data structured, although the documentation suggests otherwise.
Regardless, if anyone were to run into similar issues, this is the gist of what must be done to fix it, or at least the method I found worked. I'm sure there are far more performant and elegant solutions than mine, but I will be dealing with dozens of rows of data, not thousands or millions. Performance isn't a big concern for me.
var correct = [[data],[data]] -
is the data structure that is required for setValues()
therefore
?.setValues(correct)

Using setvalues with a for loop

I am currently trying to perform a "copyTo" or "setValues" action.
I need to copy one grid (Sheet1!J16:Q36) to another grid (Sheet2!J16:Q36)
But not every cell shall be copied. Only those values that are not identical to the Sheet1 values shall be copied.
I have tried the below code with success, but sadly the script takes ages.
I understand that a batch operation with getValues in an array will be quicker, but I lack the capability to do that script.
I also used a third grid which compared the values of sheet1 and 2 and returned 1 or 0. Only if the value 1 was shown, the cell was considered by the for loop. I take it that this is inefficient.
Thank you for your help. I appreciate it a lot.
var ratenprogramm = SpreadsheetApp.getActiveSpreadsheet();
var ratenprogrammmain = ratenprogramm.getSheetByName("Ratenprogramm");
var vorlageratenprogramm =
ratenprogramm.getSheetByName("VorlageRatenprogramm");
for(i=1;i<=21;i++)
{
for(j=1;j<=8;j++)
{
if(vorlageratenprogramm.getRange(37+i,9+j).getValue() == 1)
{
vorlageratenprogramm.getRange(15+i,9+j).copyTo(ratenprogrammmain.getRange(15+i,9+j),{contentsOnly: true});
}
}
}
As you have noticed, calling any external services, including methods
like getValue() make your script slow, see Apps Script Best
Practices.
Your code can be optimized by replacing the multiple getValue()
requests by a single getValues().
Within the nested loops you can specify a multiple amount of ranges
and values that can be written with the Advanced Sheets Service,
with the Sheets API method spreadsheets.values.batchUpdate into
the corresponding ranges of the destination sheet, see also
here.
Sample
function myFunction() {
var ratenprogramm = SpreadsheetApp.getActiveSpreadsheet();
var ratenprogrammmain = ratenprogramm.getSheetByName("Ratenprogramm");
var vorlageratenprogramm = ratenprogramm.getSheetByName("VorlageRatenprogramm");
var data=[];
var range=vorlageratenprogramm.getRange(15,9,21,8);
var values=range.getValues();
for(i=0;i<4;i++)
{
for(j=0;j<1;j++)
{
if(values[i+1][j] == 1)
{
var cell=range.getCell(i+1,j+1).getA1Notation();
data.push([{ range:'Ratenprogramm!'+ cell, values: [[values[i+1][j]]]}]);
}
}
}
var resource = {
valueInputOption: "USER_ENTERED",
data: data
};
Sheets.Spreadsheets.Values.batchUpdate(resource, spreadsheetId);
}
Keep in mind that if you have many different ranges, it might be
easier and faster to overwrite the sheet with the complete range,
rather than using nesting looping. E.g.
vorlageratenprogramm.getRange(15,9,21,8).copyTo(ratenprogrammmain.getRange(15,9,21,8),{contentsOnly:
true});.

Non-blocking array reduce in NodeJS?

I have a function that takes in two very large arrays. Essentially, I am matching up orders with items that are in a warehouse available to fulfill that order. The order is an object that contains a sub array of objects of order items.
Currently I am using a reduce function to loop through the orders, then another reduce function to loop through the items in each order. Inside this nested reduce, I am doing a filter on items a customer returned so as not to give the customer a replacement with the item they just send back. I am then filtering the large array of available items to match them to the order. The large array of items is mutable since I need to mark an item used and not assign it to another item.
Here's some psudocode of what I am doing.
orders.reduce(accum, currentOrder)
{
currentOrder.items.reduce(internalAccum, currentItem)
{
const prevItems = prevOrders.filter(po => po.customerId === currentOrder.customerId;
const availItems = staticItems.filter(si => si.itemId === currentItem.itemId && !prevItems.includes(currentItem.labelId)
// Logic to assign the item to the order
}
}
All of this is running in a MESOS cluster on my server. The issue I am having is that my MESOS system is doing a health check every 10 seconds. During this working of the code, the server will stop responding for a short period of time (up to 45 seconds or so). The health check will kill the container after 3 failed attempts.
I am needing to find some way to do this complex looping without blocking the response of the health check. I have tried moving everything to a eachSerial using the async library but it still locks up. I have to do the work in order or I would have done something like async.each or async.eachLimit, but if not processed in order, then items might be assigned the same thing simultaneously.
You can do batch processing here with a promisified setImmediate so that incoming events can have a chance to execute between batches. This solution requires async/await support.
async function batchReduce(list, limit, reduceFn, initial) {
let result = initial;
let offset = 0;
while (offset < list.length) {
const batchSize = Math.min(limit, list.length - offset);
for (let i = 0; i < batchSize; i++) {
result = reduceFn(result, list[offset + i]);
}
offset += batchSize;
await new Promise(setImmediate);
}
return result;
}

Sorted array: how to get position before and after using name? as3

I have been working on a project and Stack Overflow has helped me with a few problems so far, so I am very thankful!
My question is this:
I have an array like this:
var records:Object = {};
var arr:Array = [
records["nh"] = { medinc:66303, statename:"New Hampshire"},
records["ct"] = { medinc:65958, statename:"Connecticut"},
records["nj"] = { medinc:65173, statename:"New Jersey"},
records["md"] = { medinc:64596, statename:"Maryland"},
etc... for all 50 states. And then I have the array sorted reverse numerically (descending) like this:
arr.sortOn("medinc", Array.NUMERIC);
arr.reverse();
Can I call the name of the record (i.e. "nj" for new jersey) and then get the value from the numeric position above and below the record in the array?
Basically, medinc is medium income of US states, and I am trying to show a ranking system... a user would click Texas for example, and it would show the medinc value for Texas, along with the state the ranks one position below and the state that ranks one position above in the array.
Thanks for your help!
If you know the object, you can use the array.indexOf().
var index:int = records.indexOf(records["nj"]);
var above:Object;
var below:Object;
if(index + 1 < records.length){ //make sure your not already at the top
above = records[index+1];
}
if(index > 0){ //make sure your not already at the bottom
below = records[index-1];
}
I think this is the answer based on my understanding of your data.
var index:int = arr.indexOf(records["nh"]);
That will get you the index of the record that was clicked on and then for find the ones below and above just:
var clickedRecord:Object = arr[index]
var higherRecord:Object = arr[index++]
var lowerRecord:Object = arr[index--]
Hope that answers your question
Do you really need records to be hash?
If no, you can simply move key to record field and change records to simple array:
var records: Array = new Array();
records.push({ short: "nh", medinc:66303, statename:"New Hampshire"}),
records.push({ short: "ct", medinc:65958, statename:"Connecticut"}),
....
This gives you opportunity to create class for State, change Array to Vector and make all of this type-safe, what is always good.
If you really need those keys, you can add objects like above (with "short" field) in the same way you are doing it now (maybe using some helper function which will help to avoid typing shortname twice, like addState(records, data) { records[data.short] = data }).
Finally, you can also keep those records in two objects (or an object and an array or whatever you need). This will not be expensive, if you will create state object once and keep references in array/object/vector. It would be nice idea if you need states sorted on different keys often.
This is not really a good way to have your data set up - too much typing (you are repeating "records", "medinc", "statename" over and over again, while you definitely could've avoided it, for example:
var records:Array = [];
var states:Array = ["nh", "ct", "nj" ... ];
var statenames:Array = ["New Hampshire", "Connecticut", "New Jersey" ... ];
var medincs:Array = [66303, 65958, 65173 ... ];
var hash:Object = { };
function addState(state:String, medinc:int, statename:String, hash:Object):Object
{
return hash[state] = { medinc: medinc, statename: statename };
}
for (var i:int; i < 50; i++)
{
records[i] = addState(states[i], medincs[i], statenames[i], hash);
}
While you have done it already the way you did, that's not essential, but this could've saved you some keystrokes, if you haven't...
Now, onto your search problem - first of all, true, it would be worth to sort the array before you search, but if you need to search an array by the value of the parameter it was sorted on, there is a better algorithm for that. That is, if given the data in your example, your specific task was to find out in what state the income is 65958, then, knowing that array is sorted on income you could employ binary search.
Now, for the example with 50 states the difference will not be noticeable, unless you do it some hundreds of thousands times per second, but in general, the binary search would be the way to go.
If the article in Wiki looks too long to read ;) the idea behind the binary search is that at first you guess that the searched value is exactly in the middle of the array - you try that assumption and if you guessed correct, return the index you just found, else - you select the interval containing the searched value (either one half of the array remaining) and do so until you either find the value, or check the same index - which would mean that the value is not found). This reduces asymptotic complexity of the algorithm from O(n) to O(log n).
Now, if your goal was to find the correspondence between the income and the state, but it wasn't important how that scales with other states (i.e. the index in the array is not important), you could have another hash table, where the income would be the key, and the state information object would be the value, using my example above:
function addState(state:String, medinc:int, statename:String,
hash:Object, incomeHash:Object):Object
{
return incomeHash[medinc] =
hash[state] = { medinc: medinc, statename: statename };
}
Then incomeHash[medinc] would give you the state by income in O(1) time.

Resources