Entity Framework join on large in-memory list

Entity Framework join on large in-memory list - sql-server

From what I am aware doing a join on in-memory data basically loads the entire data set in, e.g.
var ids = new int[] { 1, 2, 3}
var data = context.Set<Product>().Join(ids, x => x.Id, id => id, (x, id) => x)
This will cause the entire 'Product' table to be loaded into memory. In my current case this is not an option as it contains millions of records.
The common solution I have seen is to use Contains(), e.g.
var ids = new int[] { 1, 2, 3}
var data = context.Set<Product>().Where(x => ids.Contains(x.Id));
which generates an SQL IN clause like WHERE Id IN (1, 2, 3)
But what if the in-memory list is very large? Does this create any issues if say you have an IN clause with 1000's of values? or 10's of thousands?
Is there some alternative query that creates different (better?) SQL?

Related

How can I efficiently pull a significant number of records from IndexedDb knowing their keys?

So my trouble is that I have a collection of keys of certain records that I keep in my IndexedDb in the Chrome browser, the size of the store (aka table) is near 200 000, the set of keys is about 5 000.
There is no way to use indices, because the keys are very random.
What would be the fastest way to pull these 5000 records corresponding to the keys I have?
My current solution is to cursor through all records in the DB and check if each key is in the set. It's starting to be noticeably slow.

My current solution is to cursor through all records in the DB and check if each key is in the set. It's starting to be noticeably slow.
Rather than reading every record, read only the ones with matching keys (assuming by "key" you mean the key of the object store):
const sortedKeys = [1, 2, 3];
const output = [];
const range = IDBKeyRange.bound(sortedKeys[0], sortedKeys[sortedKeys.length - 1]);
let i = 0;
const request = objectStore.openCursor(range);
request.onsuccess = (event) => {
const cursor = event.target.result;
if (!cursor) {
console.log(output);
return;
}
output.push(cursor.value);
i += 1;
cursor.continue(sortedKeys[i]);
};

Need to insert records into postgres table using array of objects

I have a Typescript object like this (the properties are made up, but the object is in the form listed below):
shipments = [
{
id: 1,
name: 'test',
age: 2,
orderNumber: null
},
{
id: 2,
name: 'test2',
age: 4,
orderNumber: '1434'
},
]
I need to write a postgresql statement that takes that array and puts it into a table that has columns id, name, age, and orderNumber. I can't do any iteration on the data (that's why I'm trying to stuff an array I already have using one import statement - because it's way faster than iteration). I need to take that array - without adding any kind of Typescript manipulation to it - and use it in an postgresql insert statement. Is this possible? To maybe make more clear what I want to do, I want to take the shipments object from above and insert it similar to what this insert statement would do.
INSERT INTO table (id, name, age, orderNumber) VALUES (1, 'test', 2', null), (2, 'test2', 4, '1434')
But more automated such as:
INSERT INTO table (variable_column_list) VALUES shipments_array_with_only_values_not_keys
I saw an example using json_to_recordset, but it wasn't working for me, so the use case may have been different.
This is what I am currently doing, using adonis and multiInsert; however, that only allows 1000 records at a time. I was hoping for all the records in one postgres statement.
await Database.connection(tenant).table(table).multiInsert(shipments)
Thanks in advance, for the help!

Are you sure you don't wanna use any ORM / libs for that?
You can generate SQL from array like this (not the best solution, just quick one):
const getQuery = shipments => `INSERT INTO table (${Object.keys(shipments[0]).map(key => '`' + key + '`').join(', ')})\nVALUES\n${shipments.map(row => `(${Object.values(row).map(value => value ? typeof value !== 'number' ? '`' + value + '`' : value : 'null').join(', ')})`).join(',\n')}`;
console.log(getQuery(shipments));
Output:
INSERT INTO table (`id`, `name`, `age`, `orderNumber`)
VALUES
(1, 'test', 2, null),
(2, 'test2', 4, '1434')
All records will be merged into one insert query, but:
Large amount of data per one query is unreasonable and causes crashes / freezes. So you still need to chunk data somehow (This question might be useful):
for (let chunk of chunks) {
await Database.rawQuery(getQuery(chunk))
}
No dynamic structure here! Each array element should have same structure with same set of keys in exactly same order:
interface IShipment {
id: number;
name: string;
age: number;
orderNumber?: string;
} // or whatever
const shipments: IShipment[] = [ ... ] // always should be
Interpolation depends of type: in shipments.map I interpolate numbers, and null(~ish) values. All other types I convert to string, which is OK for this case but completely wrong in general. For example or array should be stringified, not converted to string.
You need to deal with possible injections (SQL injection wiki page). For example with promise-mysql package you can use pool.escape() method for your values to prevent injection:
Object.values(row).map(value => pool.escape(value)); // or somehow else
Conclusion:
Push all records into one statement is not the best idea, especially on your own.
I suggest chunking & inserting via adonis you aleady used:
const shipments: IShipment[] = [ ... ] // any amount of records
cosnt chunkSize = 1000; // adonis limit
const chunks: IShipment[][] = shipments.reduce((resultArray, item, index) => {
const chunkIndex = Math.floor(index / chunkSize);
if(!resultArray[chunkIndex]) resultArray[chunkIndex] = [];
resultArray[chunkIndex].push(item);
return resultArray;
}, []); // split them into chunks
for (let chunk of chunks) {
await Database.rawQuery(getQuery(chunk)); // insert chunks one-by-one
}

Get Multiple Rows on SQLite in Flutter

I keep only the "productID" information of the products added to the favorites in the "favorite" table in the database. I had no problem adding this information to the table. (product table and favorite table are separate.) But when I wanted to list all favorite products in the table, I found the codes that allowed me to query one by one. What I need is a function that can give the List<int> productIDs as parameter and return favorite products as List<Map<String, dynamic>>> . I looked almost everywhere but could not find it.
The function I use to fetch favorite product IDs stored in the favorite table:
Future<List<Map<String, dynamic>>> favProds() async{
List<int> IDs = new List<int>();
var db = await _getDatabase();
var result = await db.query(_favTable, orderBy: '$_columnProdID DESC');
for(Map incomingMap in result){
IDs.add(incomingMap['prodID']);
}
return _getFavProdsWithIDList(IDs);
}
This is my function that takes the ID List as a parameter and returns a list containing favorite products as maps:
Future<List<Map<String, dynamic>>> _getFavProdsWithIDList(List<int> IDs) async{
var db = await _getDatabase();
var result = await db.query("$_prodTable", where: '$_columnProdID = ?', whereArgs: IDs);
return result;
}
Here is the error I get when I use them:
Unhandled Exception: DatabaseException(Cannot bind argument at index 8 because the index is out of range. The statement has 1 parameters.) sql 'SELECT * FROM urun WHERE urunID = ?' args [230, 180, 179, 20, 19, 18, 17, 2]}
From this error, we see that adding products to the favorite table is successful.

If you wan select records where id in list of ids you should use query like
SELECT * FROM urun WHERE urunID IN (1, 2, 3);
You have two options.
Provide same number of placeholders as list length
final placeholders = List.generate(5,(_) => "?").join(",");
var result = await db.query("$_prodTable", where: '$_columnProdID IN ($placeholders)', whereArgs: IDs);
Since ids is integers just
var result = await db.query("$_prodTable", where: '$_columnProdID IN (${IDs.join(",")})');

Laravel 5.5 - Updating a pivot table with a custom field given two input arrays

I have 2 input arrays, one for ingredients and one for the amount of the ingredient that is required for an associated recipe. My pivot table has four columns - id, recipe_id, ingredient_id and amount. I want to use the sync method to update the pivot table, however I can't work out how I would go about passing the second 'amounts' array values and ensuring they are synced with the correct record?
$ingredients = $request->ingredients;
$ingredientAmounts = $request->ingredients_amount;
$project->ingredients()->sync( $ingredients => ['amount' => $ingredientAmounts] );
The ingredient and its amount will both have the same key so I guess I could loop through them manually and update the pivot table, but I feel like there will be a simpler way which will make better use of eloquent.

The two input arrays need to be merged to be in the format required:
$user->roles()->sync([1 => ['expires' => true], 2, 3]);
From https://laravel.com/docs/5.5/eloquent-relationships#updating-many-to-many-relationships
$array = [];
foreach ($ingredients as $key => $ingredient) {
$array[$ingredient->id] = ['amount' => $ingredientAmounts[$key]];
}
$project->ingredients()->sync($array);

How to get item count from DynamoDB?

I want to know item count with DynamoDB querying.
I can querying for DynamoDB, but I only want to know 'total count of item'.
For example, 'SELECT COUNT(*) FROM ... WHERE ...' in MySQL
$result = $aws->query(array(
'TableName' => 'game_table',
'IndexName' => 'week-point-index',
'KeyConditions' => array(
'week' => array(
'ComparisonOperator' => 'EQ',
'AttributeValueList' => array(
array(Type::STRING => $week)
)
),
'point' => array(
'ComparisonOperator' => 'GE',
'AttributeValueList' => array(
array(Type::NUMBER => $my_point)
)
)
),
));
echo Count($result['Items']);
this code gets the all users data higher than my point.
If count of $result is 100,000, $result is too much big.
And it would exceed the limits of the query size.
I need help.

With the aws dynamodb cli you can get it via scan as follows:
aws dynamodb scan --table-name <TABLE_NAME> --select "COUNT"
The response will look similar to this:
{
"Count": 123,
"ScannedCount": 123,
"ConsumedCapacity": null
}
notice that this information is in real time in contrast to the describe-table api

You can use the Select parameter and use COUNT in the request. It "returns the number of matching items, rather than the matching items themselves". Important, as brought up by Saumitra R. Bhave in a comment, "If the size of the Query result set is larger than 1 MB, then ScannedCount and Count will represent only a partial count of the total items. You will need to perform multiple Query operations in order to retrieve all of the results".
I'm Not familiar with PHP but here is how you could use it with Java. And then instead of using Count (which I am guessing is a function in PHP) on the 'Items' you can use the Count value from the response - $result['Count']:
final String week = "whatever";
final Integer myPoint = 1337;
Condition weekCondition = new Condition()
.withComparisonOperator(ComparisonOperator.EQ)
.withAttributeValueList(new AttributeValue().withS(week));
Condition myPointCondition = new Condition()
.withComparisonOperator(ComparisonOperator.GE)
.withAttributeValueList(new AttributeValue().withN(myPoint.toString()))
Map<String, Condition> keyConditions = new HashMap<>();
keyConditions.put("week", weekCondition);
keyConditions.put("point", myPointCondition);
QueryRequest request = new QueryRequest("game_table");
request.setIndexName("week-point-index");
request.setSelect(Select.COUNT);
request.setKeyConditions(keyConditions);
QueryResult result = dynamoDBClient.query(request);
Integer count = result.getCount();
If you don't need to emulate the WHERE clause, you can use a DescribeTable request and use the resulting item count to get an estimate.
The number of items in the specified table. DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value.
Also, an important note from the documentation as noted by Saumitra R. Bhave in the comments on this answer:
If the size of the Query result set is larger than 1 MB, ScannedCount and Count represent only a partial count of the total items. You need to perform multiple Query operations to retrieve all the results (see Paginating Table Query Results).

Can be seen from UI as well.
Go to overview tab on table, you will see item count. Hope it helps someone.

I'm too late here but like to extend Daniel's answer about using aws cli to include filter expression.
Running
aws dynamodb scan \
--table-name <tableName> \
--filter-expression "#v = :num" \
--expression-attribute-names '{"#v": "fieldName"}' \
--expression-attribute-values '{":num": {"N": "123"}}' \
--select "COUNT"
would give
{
"Count": 2945,
"ScannedCount": 7874,
"ConsumedCapacity": null
}
That is, ScannedCount is total count and Count is the number of items which are filtered by given expression (fieldName=123).

Replace the table name and use the below query to get the data on your local environment:
aws dynamodb scan --table-name <TABLE_NAME> --select "COUNT" --endpoint-url http://localhost:8000
Replace the table name and remove the endpoint url to get the data on production environment
aws dynamodb scan --table-name <TABLE_NAME> --select "COUNT"

If you happen to reach here, and you are working with C#, here is the code:
var cancellationToken = new CancellationToken();
var request = new ScanRequest("TableName") {Select = Select.COUNT};
var result = context.Client.ScanAsync(request, cancellationToken).Result;
totalCount = result.Count;

If anyone is looking for a straight forward NodeJS Lambda count solution:
const data = await dynamo.scan({ Select: "COUNT", TableName: "table" }).promise();
// data.Count -> number of elements in table.

I'm posting this answer for anyone using C# that wants a fully functional, well-tested answer that demonstrates using query instead of scan. In particular, this answer handles more than 1MB size of items to count.
public async Task<int> GetAvailableCount(string pool_type, string pool_key)
{
var queryRequest = new QueryRequest
{
TableName = PoolsDb.TableName,
ConsistentRead = true,
Select = Select.COUNT,
KeyConditionExpression = "pool_type_plus_pool_key = :type_plus_key",
ExpressionAttributeValues = new Dictionary<string, AttributeValue> {
{":type_plus_key", new AttributeValue { S = pool_type + pool_key }}
},
};
var t0 = DateTime.UtcNow;
var result = await Client.QueryAsync(queryRequest);
var count = result.Count;
var iter = 0;
while ( result.LastEvaluatedKey != null && result.LastEvaluatedKey.Values.Count > 0)
{
iter++;
var lastkey = result.LastEvaluatedKey.Values.ToList()[0].S;
_logger.LogDebug($"GetAvailableCount {pool_type}-{pool_key} iteration {iter} instance key {lastkey}");
queryRequest.ExclusiveStartKey = result.LastEvaluatedKey;
result = await Client.QueryAsync(queryRequest);
count += result.Count;
}
_logger.LogDebug($"GetAvailableCount {pool_type}-{pool_key} returned {count} after {iter} iterations in {(DateTime.UtcNow - t0).TotalMilliseconds} ms.");
return count;
}
}

DynamoDB now has a 'Get Live Item Count' button in the UI. Please note the production caveat if you have a large table that will consume read capacity.

In Scala:
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder
import com.amazonaws.services.dynamodbv2.document.DynamoDB
val client = AmazonDynamoDBClientBuilder.standard().build()
val dynamoDB = new DynamoDB(client)
val tableDescription = dynamoDB.getTable("table name").describe().getItemCount()

Similar to Java in PHP only set Select PARAMETER with value 'COUNT'
$result = $aws->query(array(
'TableName' => 'game_table',
'IndexName' => 'week-point-index',
'KeyConditions' => array(
'week' => array(
'ComparisonOperator' => 'EQ',
'AttributeValueList' => array(
array(Type::STRING => $week)
)
),
'point' => array(
'ComparisonOperator' => 'GE',
'AttributeValueList' => array(
array(Type::NUMBER => $my_point)
)
)
),
'Select' => 'COUNT'
));
and acces it just like this :
echo $result['Count'];
but as Saumitra mentioned above be careful with resultsets largers than 1 MB, in that case use LastEvaluatedKey til it returns null to get the last updated count value.

Adding some additional context to this question. In some circumstances it makes sense to Scan the table to obtain the live item count. However, if this is a frequent occurrence or if you have large tables then it can be expensive from both a cost and performance point of view. Below, I highlight 3 ways to gain the item count for your tables.
1. Scan
Using a Scan requires you to read every item in the table, this works well for one off queries but it is not scalable and can become quite expensive. Using Select: COUNT will prevent returning data, but you must still pay for reading the entire table.
Pros
Gets you the most recent item count ("live")
Is a simple API call
Can be run in parallel to reduce time
Cons
Reads the entire dataset
Slow performance
High cost
CLI example
aws dynamodb scan \
--table-name test \
--select COUNT
2. DescribeTable
DynamoDB DescribeTable API provides you with an estimated value for ItemCount which is updated approx. every 6 hours.
The number of items in the specified table. DynamoDB updates this value approximately every six hours. Recent changes might not be reflected in this value. Ref.
Calling this API gives you an instant response, however, the value of the ItemCount could be up to 6 hours stale. In certain situations this value may be adequate.
Pros
Instant response
No cost to retrieve ItemCount
Can be called frequently
Cons
Data could be stale by up to 6 hours.
CLI Example
aws dynamodb describe-table \
--table-name test \
--query Table.ItemCount
DescribeTable and CloudWatch
As previously mentioned DescribeTable updates your tables ItemCount approx. every 6 hours. We can obtain that value and plot it on a custom CloudWatch graph which allows you to monitor your tables ItemCount over time, providing you historical data.
Pros
Provides historical data
Infer how your ItemCount changes over time
Reasonably easy to implement
Cons
Data could be stale by up to 6 hours.
Implementation
Tracking DynamoDB Storage History with CloudWatch showcases how to automatically push the value of DescribeTable to CloudWatch periodically using EventBridge and Lambda, however, it is designed to push TableSizeBytes instead of ItemCount. Some small modifications to the Lambda will allow you to record ItemCount.
Change this line from TableSizeBytes to ItemCount
Remove line 18 to line 27

You could use dynamodb mapper query.
PaginatedQueryList<YourModel> list = DymamoDBMapper.query(YourModel.class, queryExpression);
int count = list.size();
it calls loadAllResults() that would lazily load next available result until allResultsLoaded.
Ref: https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/DynamoDBMapper.Methods.html#DynamoDBMapper.Methods.query

This is how you would do it using the DynamoDBMapper (Kotlin syntax), example with no filters at all:
dynamoDBMapper.count(MyEntity::class.java, DynamoDBScanExpression())

$aws = new Aws\DynamoDb\DynamoDbClient([
'region' => 'us-west-2',
'version' => 'latest',
]);
$result = $aws->scan(array(
'TableName' => 'game_table',
'Count' => true
));
echo $result['Count'];

len(response['Items'])
will give you the count of the filtered rows
where,
fe = Key('entity').eq('tesla')
response = table.scan(FilterExpression=fe)

I used scan to get total count of the required tableName.Following is a Java code snippet for same
Long totalItemCount = 0;
do{
ScanRequest req = new ScanRequest();
req.setTableName(tableName);
if(result != null){
req.setExclusiveStartKey(result.getLastEvaluatedKey());
}
result = client.scan(req);
totalItemCount += result.getItems().size();
} while(result.getLastEvaluatedKey() != null);
System.out.println("Result size: " + totalItemCount);

This is solution for AWS JavaScript SDK users, it is almost same for other languages.
Result.data.Count will give you what you are looking for
apigClient.getitemPost({}, body, {})
.then(function(result){
var dataoutput = result.data.Items[0];
console.log(result.data.Count);
}).catch( function(result){
});

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight