How to remove redundant data with skip and limit in mongodb

How to remove redundant data with skip and limit in mongodb - database

When we fetch data from db with skip and limit then it's very high chance that data may be redundant.
Let me explain you with an example
suppose you are fetching those student records which belongs to some state x and you already fetched 10 student records. Between the time of first and second request one more student record is inserted, deleted or updated then in next query either one data row will come again or inserted data row will be skiped.
How to solve such case?

Method - 1
It can be resolved by using sending 'Created_by' and 'Updated_by' time from UI and filter the data according to it and send the data.
Method - 2
In the second fetch request and there after skip one less then you want and increase limit by 1(Yes, its correct, think about it), and pass those data to UI now UI check the there current list data's last item should match the first item of fetched request's response(just compare by ids), then it mean that no single data added or removed after first query. If it's not match then fetch complete data from first to last in a single query.
If you use individual method then its not rock solid(Yes, some corner case will miss in each case) but if you combine both methods then it will be rock solid.

Related

Laravel skip and delete records from Database

I'm developing an app which needs to record a list of a users recent video uploads. Importantly it needs to only remember the last two videos associated with the user so I'm trying to find a way to just keep the last two records in a database.
What I've got so far is the below, which creates a new record correctly, however I then want to delete all records that are older than the previous 2, so I've got the below.
The problem is that this seems to delete ALL records even though, by my understanding, the skip should miss out the two most recent records,
private function saveVideoToUserProfile($userId, $thumb ...)
{
RecentVideos::create([
'user_id'=>$userId,
'thumbnail'=>$thumb,
...
]);
RecentVideos::select('id')->where('user_id', $userId)->orderBy('created_at')->skip(2)->delete();
}
Can anyone see what I'm doing wrong?

Limit and offset do not work with delete, so you can do something like this:
$ids = RecentVideos::select('id')->where('user_id', $userId)->orderByDesc('created_at')->skip(2)->take(10000)->pluck('id');
RecentVideos::whereIn('id', $ids)->delete();

First off, skip() does not skip the x number of recent records, but rather the x number of records from the beginning of the result set. So in order to get your desired result, you need to sort the data in the correct order. orderBy() defaults to ordering ascending, but it accepts a second direction argument. Try orderBy('created_at', 'DESC'). (See the docs on orderBy().)
This is how I would recommend writing the query.
RecentVideos::where('user_id', $userId)->orderBy('created_at', 'DESC')->skip(2)->delete();

Talend avoid duplicate external ID with Salesforce Output

We are importing data on Salesforce through Talend and we have multiple items with the same internal id.
Such import fails with error "Duplicate external id specified" because of how upsert works in Salesforce. At the moment, we worked that around by using the commit size of the tSalesforceOutput to 1, but that works only for small amount of data or it would exhaust Salesforce API Limits.
Is there a known approach to it in Talend? For example, to ensure items that have same external ID ends up in different "commits" of tSalesforceOutput?

Here is the design for the solution I wish to propose:
tSetGlobalVar is here to initialize the variable "finish" to false.
tLoop starts a while loop with (Boolean)globalMap.get("finish") == false as an end condition.
tFileCopy is used to copy the initial file (A for example) to a new one (B).
tFileInputDelimited reads file B.
tUniqRow eliminates duplicates. Uniques records go to tLogRow you have to replace by tSalesforceOutput. Duplicates records if any go to tFileOutputDelimited called A (same name as the original file) with the option "Throw an error if the file already exist" unchecked.
OnComponent OK after tUniqRow activates the tJava which set the new value for the global finish with the following code:
if (((Integer)globalMap.get("tUniqRow_1_NB_DUPLICATES")) == 0) globalMap.put("finish", true);
Explaination with the following sample data:
line 1
line 2
line 3
line 2
line 4
line 2
line 5
line 3
On the 1st iteration, 5 uniques records are pushed into tLogRow, 3 duplicates are pushed into file A and "finish" is not changed as there is duplicates.
On the 2nd iteration, operations are repeated for 2 uniques records and 1 duplicate.
On the 3rd iteration, operations are repeated for 1 unique and as there not anymore duplicate, "finish" is set to true and the loop automatically finishes.
Here is the final result:
You can also decide to use an other global variable to set the salesforce commit level (using the syntax (Integer)globalMap.get("commitLevel")). This variable will be set to 200 by default and to 1 in the tJava if any duplicates. At the same time, set "finish" to true (without testing the number of duplicates) and you'll have a commit level to 200 for the 1st iteration and to 1 for the 2nd (and no need more than 2 iterations).
You'll decide the better choice depending on the number of potential duplicates, but you can notice that you can do it whitout any change to the job design.
I think it should solve your problem. Let me know.
Regards,
TRF

Do you mean you have the same record (the same account for example) twice or more in the input?If so, can't you try to eliminate the duplicates and keep only the record you need to push to Salesforce?Else, if each record has specific informations (so you need all the input records to have a complete one in Salesforce), consider to merge the records before to push the result into Salesforce.
And finally, if you can't do that, push the doublons in a temporary space, push the records but the doublons into Salesforce and iterate other this process until there is no more doublons.Personally, if you can't just eliminate the doublons, I prefer the 2nd approach as it's the solution to have less Salesforce API calls.
Hope this helps.
TRF

Get current length of records after filter selection

i started using ag-grid, and i need to make some changes in my app, one of them is when the data is loaded in the table, in my footer i show the number of records, but when i start filtering the data in the table, the number of the current lines starts decreasing. can i get the current length of records when filtering the data. Is there a method where a can get the current filteres data?
A example to achieve is like:
Data Loaded:
10/10
Data after being filtered:
7/10

use '''api.forEachNodeAfterFilter(callback)''' to iterate through all the rows, counting as you go. mentioned in the api:
https://www.ag-grid.com/javascript-grid-api/index.php
to know when the row count has changed, listen to the modelUpdated event

May be a duplicate of [How to get the number of filtered rows in ag-Grid?]
[1]: How to get the number of filtered rows in ag-Grid?
The answer there is:
gridOptions.api.getDisplayedRowCount()

you can use the below code to get the filtered rows count
gridOptions.api.getModel().rootNode.childrenAfterFilter.length

GWT how to setup Pager in DataGrid when using Objectify Cursors

I recently got it to the point where I can retrieve data with a Cursor (see this link: GWT pass Objectify Cursor from Server to Client with RequestFactory and show more pages in DataGrid)
what I am running into - when I get the data pack on the client side its only a List of 25 - when I go to set the data in the DataGrid the pager on the bottom says showing 1-25 of 25, there are obviously more records in the database I'm just retrieving 25 of them at a time with the cursor
What I tried doing is setting the following:
pager.setRangeLimited(false);
Unfortunately - while this allows me to page and select more from the database - it never actually gives me the amount in the database. What I am wondering is, if I'm using a Cursor on the server side - how do I set the total count in the Pager??
One thing i thought about doing is simply adding a total count variable to the ListCursor wrapper object i'm returning - unfortunately this would require that if i request it with a null initial query i go through and get the total count every time - this seems horribly inefficient - and then once i get this back I still have no idea how to actually tell the pager that more data is available than i actually gave it.
Any help on this would be really appreciated

You set the total count in the pager by telling the pager that the row count is exact :
asyncDataProvider.updateRowCount(int size, boolean exact);
If you don't tell the pager that the row count is exact, then you can obviously not navigate to the last page.
The core issue is how to get hold of the total row count. Querying the row count is indeed highly inefficient. A better bet would be to keep a counter in the data store that tracks the number of records. This can be quite inefficient too, because you have the increment this counter synchronized/transactional.
In my project, I dont keep track of the exact row count but I provide flexible search options.

How do travel websites implement the sorting of search results?

For example you make a search for a hotel in London and get 250 hotels out of which 25 hotels are shown on first page. On each page user has an option to sort the hotels based on price, name, user-reviews etc. Now the intelligent thing to do will be to only get the first 25 hotels on the first page from the database. When user moves to page 2, make another database query for next 25 hotels and keep the previous results in cache.
Now consider this, user is on page 1 and sees 25 hotels sorted by price and now he sorts them based on user-ratings, in this case, we should keep the hotels we already got in cache and only request for additional hotels. How is that implemented? Is there something built in any language (preferably php) or we have to implement it from scratch using multiple queries?

This is usually done as follows:
The query is executed with order by the required field, and with a top (in some databases limit) set to (page_index + 1) * entries_per_page results. The query returns a random-access rowset (you might also hear of this referred to as a resultset or a recordset depending on the database library you are using) which supports methods such as MoveTo( row_index ) and MoveNext(). So, we execute MoveTo( page_index * entries_per_page ) and then we read and display entries_per_page results. The rowset generally also offers a Count property which we invoke to get the total number of rows that would be fetched by the query if we ever let it run to the end (which of course we don't) so that we can compute and show the user how many pages exist.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to remove redundant data with skip and limit in mongodb - database

Related

Laravel skip and delete records from Database

Talend avoid duplicate external ID with Salesforce Output

Get current length of records after filter selection

GWT how to setup Pager in DataGrid when using Objectify Cursors

How do travel websites implement the sorting of search results?

Categories

Resources