How to get just the size of the value in BerkeleyDB? - database

Is there a way to get only the length (in bytes) of a value stored in BDB? I don't need the entire data array, only its size.

If you don't want to have to retrieve the entire entry and aren't using DPL, I'd say you should add a secondary index on the size of the stored byte array and make sure that your DAO properly updates this value on any save or updates. You could add a KeyCreator which creates a secondary size key in a secondary database based on the record.
What type of query are you trying to perform? Are you wanted to search for all records of a given size? Or are you wanting to know the size of a certain record before you retrieve it? I think the latter question is harder to answer.

I'm assuming you're using the JE version (or the Java binding of BDB) in which case, once you get the DatabaseEntry of the desired key, getSize() should give you what you want.
If you're using the C binding, check the DBT handle's size field.

If you store your document ids as duplicate data items, instead of as one blob data item value, then you can use DBC->count() to detect the number of matching documents without actually retrieving the long list of ids. Otherwise, the Berkeley DB API does not seem to support what you're asking for (even though you'd think it could be efficient for them to add it). I puzzled over this as well, and that was the solution I came up with for my own project.

for your problem, using the DB_DBT_PARTIAL flag ang asking for the begining of the record will provide you your first IDs and the DBT.size can be used to compute the total number of IDs.

Related

In Azure Search, can an indexer combine information from different documents to a single index item without them overwritting each other?

My goal is to create a single searchable Azure Index that has all of the relevant information currently stored in many different sql tables.
I'm also using an Azure Cognitive Service to add additional info from related documents. Each document is tied to only a single item in my Index, but each item in the index will be tied to many documents.
According to my understanding, if two documents have the same value for the indexer's Key, then the index will overwrite the extracted information from the first document with the information extracted from the second. I'm hoping there's a way to append the information instead of overwriting it. For example: if two documents relate to the same index item, I want the values mapped to keyphrases for that item to include the keyphrases found in the first document and the keyphrases found in the second document.
Is this possible? Is there a different way I should be approaching this?
If it is possible, can I do it without having duplicate values?
Currently I have multiple indexes and I'm combining the search results from each one, but this seems inefficient and likely messes up the default scoring algorithm.
Every code example I find only has one document for each index item and doesn't address my problem. Admittedly, I haven't tried to set up my index as described above, because it would take a lot of refactoring, and I'm confident it would just overwrite itself.
I am currently creating my indexes and indexers programmatically using dotnet. I'm assuming my code isn't relevant to my question, but I can provide it if need be.
Thank you so much! I'd appreciate any feedback you can give.
Edit: I'm thinking about creating a custom skill to do the aggregation for me, but I don't know how the skill would access access everything it needs. It needs the extracted info from the current document, and it needs the previously aggregated info from previous documents. I guess the custom skill could perform a search on the index and get the item that way, but that sounds dangerously hacky. Any thoughts would be appreciated.
Pasting from docs:
Indexing actions: upload, merge, mergeOrUpload, delete
You can control the type of indexing action on a per-document basis, specifying whether the document should be uploaded in full, merged with existing document content, or deleted.
Whether you use the REST API or an SDK, the following document operations are supported for data import:
Upload, similar to an "upsert" where the document is inserted if it is new, and updated or replaced if it exists. If the document is missing values that the index requires, the document field's value is set to null.
merge updates a document that already exists, and fails a document that cannot be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type Collection(Edm.String). For example, if a tags field starts with a value of ["budget"] and you execute a merge with ["economy", "pool"], the final value of the tags field is ["economy", "pool"]. It won't be ["budget", "economy", "pool"].
mergeOrUpload behaves like merge if the document exists, and upload if the document is new.
delete removes the entire document from the index. If you want to remove an individual field, use merge instead, setting the field in question to null.

Is there a better way to give arbitrary order to records in Postgres?

I'm trying to create an API that returns ordered JSON list(order can be specified by the client). I tried implementing this by set a float column for each record on application side(-1 for append to the front, +1 for append to the last, /2 for insert between the elements).
However, since I know there's sequence type on Postgres, I wonder if there is a better way to implement this.
Given requirements:
If order is not specified, append a record to the last.
Otherwise, record can be inserted in any position.
Client can delete a record in any position.
Response list order is guaranteed.
To start with - sequence is just a "counter" to be used in auto-generated fields. It is used when you need a an always increasing integer value. I don't think it is possible to use it the way you require.

Wordpress: Separate meta fields vs single array field

I have 200 fields in for a single post. I was wondering if to put them into 1 single field as an array or multiple fields. Single array has a lot of work to be done for getting the value unlike separate fields. Which would be advisable for a large site?
It would depend on your management requirements too. If a person will be editing that custom field with a huge array in it they'd need some technical knowledge.
If management isn't important you're better off storing all the data in one field to reduce the number of times you'll be firing the get_post_meta() function. Your code will be much faster if you parse the value into an array and then work with it.
Alternatively, if you'll only need a small number of these values it would be better to run get_post_meta() a couple times than to retrieve and work with the entire array.
You can test both scenarios using this method.
If you are using update_post_meta and get_post_meta then you don't need to do much work for format the data.
Ex:
$meta= array("location"=>"bangalore","time"=>"4pm","no"=>"12345678")
update_post_meta($post_id,"meta_key",$meta);
For retrieve
get_post_meta($post_id,"meta_key",true);

Another database table or a json object

I have two tables: stores and users. Every user is assigned to a store. I thought "What if I could just save all the users assigned to a store as a json object and save that json object in a field of a store." So in other words, user's data will be stored in a field instead of it's own table. There will be around 10 people to a store. I would like to know which method will require the least amount of processing for the server.
Most databases are relational, meaning there's no reason to be putting multiple different fields in one column. Besides being more work for you having to put them together and take them apart, you'd be basically ignoring the strength of the database.
If you were ever to try to access the data from another app, you'd have to make yourself go through additional steps. It also limits sorting and greatly adds to your querying difficulties (i.e. can't say where field = value because one field contains many values)
In your specific example, if the users at a store change, rather than being able to do a very efficient delete from the users table (or modify which store they're assigned to) you'd have to fetch the data and edit it, which would double your overhead.
Joins exist for a reason, and they are efficient. So, don't fear them!

Store array of numbers in database field

Context: SQL Server 2008, C#
I have an array of integers (0-10 elements). Data doesn't change often, but is retrieved often.
I could create a separate table to store the numbers, but for some reason it feels like that wouldn't be optimal.
Question #1: Should I store my array in a separate table? Please give reasons for one way or the other.
Question #2: (regardless of what the answer to Q#1 is), what's the "best" way to store int[] in database field? XML? JSON? CSV?
EDIT:
Some background: numbers being stored are just some coefficients that don't participate in any relationship, and are always used as an array (i.e. never a value is being retrieved or used in isolation).
Separate table, normalized
Not as XML or json , but separate numbers in separate rows
No matter what you think, it's the best way. You can thank me later
The "best" way to store data in a database is the way that is most conducive to the operations that will be performed on it and the one which makes maintenance easiest. It is this later requirement which should lead you to a normalized solution which means storing the integers in a table with a relationship. Beyond being easier to update, it is easier for the next developer that comes after you to understand what and how the information is stored.
Store it as a JSON array but know that all accesses will now be for the entire array - no individual read/writes to specific coefficients.
In our case, we're storing them as a json array. Like your case, there is no relationship between individual array numbers - the array only make sense as a unit and as a unit it DOES has a relationship with other columns in the table. By the way, everything else IS normalized. I liken it to this: If you were going to store a 10 byte chunk, you'd save it packed in a single column of VARBINARY(10). You wouldn't shard it into 10 bytes, store each in a column of VARBINARY(1) and then stitch them together with a foreign key. I mean you could - but it wouldn't make any sense.
YOU as the developer will need to understand how 'monolithic' that array of int's really is.
A separate table would be the most "normalized" way to do this. And it is better in the long run, probably, since you won't have to parse the value of the column to extract each integer.
If you want you could use an XML column to store the data, too.
Sparse columns may be another option for you, too.
If you want to keep it really simple you could just delimit the values: 10;2;44;1
I think since you are talking about sql server that indicates that your app may be a data driven application. If that is the case I would keep definately keep the array in the database as a seperate table with a record for each value. It will be normalized and optimized for retreival. Even if you only have a few values in the array you may need to combine that data with other retreived data that may need to be "joined" with your array values. In which case sql is optimized for by using indexes, foreign keys, etc. (normalized).
That being said, you can always hard code the 10 values in your code and save the round trip to the DB if you don't need to change the values. It depends on how your application works and what this array is going to be used for.
I agree with all the others about the best being a separate normalized table. But if you insist in having it all in the same table don't place the array in one only column. In instead create the 10 columns and store each array value in a different column. It will save you the parsing and update problems.

Resources