Why does MongoDB create _ids as objects?

Why does MongoDB create _ids as objects? - database

The _ids that are generated by MongoDB are always in this form: ObjectId("5f1b0e51b931af765f21edd4")
If the main reason for creating the _id column is to have something to uniquely identifies a document why is the generated _id format not simply in this form "5f1b0e51b931af765f21edd4.
I don't know if I'm right, but also I suspect that the first format occupies more space.

The _ids that are generated by MongoDB are always in this form: ObjectId("5f1b0e51b931af765f21edd4")
Not at all. Ids generated by MongoDB are 12-byte byte sequences. mongo shell uses the rendering ObjectId("xxx") to indicate that the value is stored as a 12-byte ObjectId and not as a 24-byte string, which is what "5f1b0e51b931af765f21edd4" is.
I don't know if I'm right, but also I suspect that the first format occupies more space.
As stored by the server, ObjectId occupies less space than a hex string you see on your screen (half as much, in fact). To convey this compact storage, the rendering of an ObjectId occupies more space on your screen.

ObjectId is a special type in Mongo. It is not like a normal object/document and only takes up 12-bytes. The ObjectId("24-character-hex-string") is just its human-readable notation.
A 24-character string takes up at least 24-bytes, and if we look up the bson spec, stores an additional 4-bytes for length and 1-byte for a null terminator, so 29-bytes total.

Related

How do I store byte arrays inside an object in Couchbase?

I want to store byte arrays (less than 1 MB) as a field value. I know about ByteArrayDocument and storing binary data as an independent non-JSON object.
To store a field as a byte array, do I just use com.couchbase.client.core.utils.Base64 to build a string value?
Or is some other approach recommended?

If you want to store it as an attribute in your JSon document, base64 would be the right approach.
However, unless your document contains only metadata about the file itself, I don't recommend using this strategy. Documents are automatically cached, and if your document is big, the cache memory will be filled quite easily.

Sqlite giving column more size than needed

I have some data which I will be putting in the database. Say I make a field like "coupondetail text(10000)" which will store the coupon detail, now consider that not all coupondetail will be 10,000 chars long. I m curious to know how much space will the column take in the database when the coupondetail text is lesser than 10,000 say 1000 chars?

sqlite does not care much how you declare your column types and ignores any maximum length specified. The declared type is just a hint; any non-INTEGER PRIMARY KEY column can contain any type.
The size taken up in the database file depends on the values you put in. In the record format, strings are stored as length followed by string data. No empty space is necessarily left there.

Amazon DynamoDB getting items as a string

I am encountering difficulty in retrieving data from my table. I am using Amazon Dynamo DB and I have successfully populated my table. When I scan the table or use getItem, the returning information is of type AttributeValue. I have looked through the documentation and I can't find how you should process an AttributeValue to get it to become an int or string. The example code of scan from the Amazon Website has the information returned in a Dictionary object, but it is a dictionary with strings mapped to Attribute Values. Do you know of anyway to query a Dynamo DB table and store the result in something where strings are mapped to string or strings are mapped to integers?

Assuming you are using the AWS SDK for Java, objects of Class AttributeValue can be of type String, Number, StringSet, NumberSet and the class features respective getters/setters accordingly, e.g.:
public String getN() - Numbers are positive or negative exact-value decimals and integers. A number can have up to 38 digits precision and can be between 10^-128 to 10^+126.
public String getS() - Strings are Unicode with UTF-8 binary encoding. The maximum size is limited by the size of the primary key (1024 bytes as a range part of a key or 2048 bytes as a single part hash key) or the item size (64k).
Please note that the return value of getN() is still a string and must be converted by your Java string to number conversion method of choice accordingly. This implicit weak typing of the DynamoDB data types retrieval/submission based on String parameters only is a bit unfortunate and doesn't exactly ease developing, see e.g. my answer to Error in batchGetItem API in java for such an issue.
Good luck!

Dropping Leading Zeros

I have a form that records a student ID number. Some of those numbers contain a leading zero. When the number gets recorded into the database it drops the leading 0.
The field is set up to only accept numbers. The length of the student ID varies.
I need the field to be recorded and displayed with the leading zero.

If you are always going to have a number of a certain length (say, it will always be 10 characters), then you can just get the length of the number in the database (after it is converted to a string) and then add the appropriate 0's.
However, if this is an arbitrary amount of leading zeros, then you will have to store the content as a string in the database so you can capture the leading zeros.

It sounds like this should be stored as string data. It sounds like the leading zeros are part of the data itself, not just part of it's formatting.
You could reformat the data for display with the leading zeros in it, however I believe you should store the correct form of the ID number, it will lead to less bugs down the road (ex: you forgot to format it in one place but not in another).

There are a few ways of doing this - depending on the answers to my comments in your question:
Store the extra data in the database by converting the datatype from numeric to varchar/string.
Advantages: Very simple in its implementation; You can treat all the values in the same way.
Disadvantage: If you've got very large amounts of data, storage sizes will escalate; indexing and sorting on strings doesn't perform so well.
Use if: Each number may have an arbitrary length (and hence number of zeros).
Don't use if: You're going to be spending a lot of time sorting data, sorting numeric strings is a pain in the ass - look up natural sorting to see some of the pitfalls;
Continue to store the data in the database as numeric but pad the numeric back to a set length (i.e. 10 as I have suggested in my example below):
Advantages: Data will index better, search better, not require such large amounts of storage if you've got large amounts of data.
Disadvantage: Every query or display of data will require every data instance to be padded to the correct length causing a slight performance hit.
Use if: All the output numbers will be the same length (i.e. including zeros they're all [for example] 10 digits); Large amounts of sorting will be necessary.
Add a field to your table to store the original length of the numeric, continue to store the value as numeric (to leverage sorting/indexing performance gains of numeric vs. string) in your new field store the length as it would include the significant zeros:
Advantages: Reduction in required storage space; maximum use of indexing; sorting of numerics is far easier than sorting text numerics; You still get the ability to pad numerics to arbitrary lengths like you have with option 1.
Disadvantages: An extra field is required in your database, so all your queries will have to pull that extra field thus potentially requiring a slight increase in resources at query/display time.
Use if: Storage space/indexing/sorting performance is any sort of concern.
Don't use if: You don't have the luxury of changing the table structure to include the extra value; This will overcomplicate already complex queries.
If I were you and I had access to modify the db structure slightly, I'd go with option 3, sure you need to pull out an extra field to get the length. The slightly increased complexity pays huge dividends in the advantages versus the disadvantages. The performance hit of padding the string back out the correct length will be far superceded by the performance increase of the indexing and storage space required.

I worked with a database with a similar problem. They were storing zip codes as a number. The consequence was that people in New Jersey couldn't use our app.
You're using data that is logically a text string and not a number. It just happens to look like a number, but you really need to treat it as text. Use a text-oriented data type, or at least create a database view that enables you to pull back a properly formatted value for this.

See here: Pad or remove leading zeroes from numbers

declare #recordNumber integer;
set #recordNumber = 93088;
declare #padZeroes integer;
set #padZeroes = 8;
select
right( replicate('0',#padZeroes)
+ convert(varchar,#recordNumber), #padZeroes);

Unless you intend on doing calculations on that ID, its probably best to store them as text/string.

Another option is since the field is an id, i would recommend creating a secondary field for display number (nvarchar) that you can use for reports, etc...
Then in your application when the student id is entered you can insert that into the database as the number, as well as the display number.

An Oracle solution
Store the ID as a number and convert it into a character for display. For instance, to display 42 as a zero-padded, three-character string:
SQL> select to_char(42, '099') from dual;
042
Change the format string to fit your needs.
(I don't know if this is transferable to other SQL flavors, however.)

You could just concatenate '1' to the beginning of the ID when storing it in the database. When retrieving it, treat it as a string and remove the first char.
MySQL Example:
SET #student_id = '123456789';
INSERT INTO student_table (id,name) VALUES(CONCAT('1',#student_id),'John Smith');
...
SELECT SUBSTRING(id,1) FROM student_table;
Mathematically:
Initially I thought too much and did it mathematically by adding an integer to the student ID, depending on its length (like 1,000,000,000 if it's 9 digits), before storing it.
SET #new_student_id = ABS(#student_id) + POW(10, CHAR_LENGTH(#student_id));
INSERT INTO student_table (id,name) VALUES(#new_student_id,'John Smith');

Size of varchar columns

In sql server does it make a difference if I define a varchar column to be of length 32 or 128?

A varchar is a variable character field. This means it can hold text data to a certain length. A varchar(32) can only hold 32 characters, whereas a varchar(128) can hold 128 characters. If I tried to input "12345" into a varchar(3) field; this is the data that will be stored:
"123"
The "45" will be "truncated" (lost).
They are very useful in instances where you know that a certain field will only be (or only should be) a certain length at maximum. For example: a zip code or state abbreviation. In fact, they are generally used for almost all types of text data (such as names/addresses/et cetera) - but in these instances you must be careful that the number you supply is a sane maximum for the type of data that will fill that column.
However, you must also be careful when using them to only allow the user to input the maximum amount of characters that the field will support. Otherwise it may lend to confusion when it truncates the user's input.

There should be no noticeable difference as the backend will only store the amount of data you insert into that column. It's not padded out to the full size of the field like it is with a char column.
Edit: For more info, this link should be useful.

It should not. It just defines the maximum length it can accommodate, the actual used length depends on the data inserted.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight