Amazon DynamoDB getting items as a string - database

I am encountering difficulty in retrieving data from my table. I am using Amazon Dynamo DB and I have successfully populated my table. When I scan the table or use getItem, the returning information is of type AttributeValue. I have looked through the documentation and I can't find how you should process an AttributeValue to get it to become an int or string. The example code of scan from the Amazon Website has the information returned in a Dictionary object, but it is a dictionary with strings mapped to Attribute Values. Do you know of anyway to query a Dynamo DB table and store the result in something where strings are mapped to string or strings are mapped to integers?

Assuming you are using the AWS SDK for Java, objects of Class AttributeValue can be of type String, Number, StringSet, NumberSet and the class features respective getters/setters accordingly, e.g.:
public String getN() - Numbers are positive or negative exact-value decimals and integers. A number can have up to 38 digits precision and can be between 10^-128 to 10^+126.
public String getS() - Strings are Unicode with UTF-8 binary encoding. The maximum size is limited by the size of the primary key (1024 bytes as a range part of a key or 2048 bytes as a single part hash key) or the item size (64k).
Please note that the return value of getN() is still a string and must be converted by your Java string to number conversion method of choice accordingly. This implicit weak typing of the DynamoDB data types retrieval/submission based on String parameters only is a bit unfortunate and doesn't exactly ease developing, see e.g. my answer to Error in batchGetItem API in java for such an issue.
Good luck!

Related

Why does MongoDB create _ids as objects?

The _ids that are generated by MongoDB are always in this form: ObjectId("5f1b0e51b931af765f21edd4")
If the main reason for creating the _id column is to have something to uniquely identifies a document why is the generated _id format not simply in this form "5f1b0e51b931af765f21edd4.
I don't know if I'm right, but also I suspect that the first format occupies more space.
The _ids that are generated by MongoDB are always in this form: ObjectId("5f1b0e51b931af765f21edd4")
Not at all. Ids generated by MongoDB are 12-byte byte sequences. mongo shell uses the rendering ObjectId("xxx") to indicate that the value is stored as a 12-byte ObjectId and not as a 24-byte string, which is what "5f1b0e51b931af765f21edd4" is.
I don't know if I'm right, but also I suspect that the first format occupies more space.
As stored by the server, ObjectId occupies less space than a hex string you see on your screen (half as much, in fact). To convey this compact storage, the rendering of an ObjectId occupies more space on your screen.
ObjectId is a special type in Mongo. It is not like a normal object/document and only takes up 12-bytes. The ObjectId("24-character-hex-string") is just its human-readable notation.
A 24-character string takes up at least 24-bytes, and if we look up the bson spec, stores an additional 4-bytes for length and 1-byte for a null terminator, so 29-bytes total.

ByteArray insertion in MarkLogic using "temporal.documentInsert" inserts but returns twice the count of ByteArray?

Have inserted into MarkLogic using temporal.documentInsert by passing ByteArray of count 5000, but after insertion when retrieving the data using cts.doc it returns the ByteArray count as 10000 (double the actual initial value).
Can someone explain why?
I can find nothing referencing 'ByteArray's in the docs.
What did you use to get the 'count' of the document ?
My guess is that there is a byte -> char conversion,
Java chars are 16 bits (2 bytes). Depending on the encoding,
which will occur both on insert and on get -- in the java JVM,
and exactly which java API you used to get 'count' (count of what?)
an exact 2x difference is suspiciously identical to a byte -> char conversion in java.
If you convert your document to a String, what is the string length (in chars),
for the input and output documents, as seen in Java, using String.length,
and using an explicit charset for conversion.

Storing hexidecimal values

I'd like to store this value efficiently in MSSQL 2016:
6d017ed2a48846f0ac025dd8603902c7
i.e, Fixed-length, ranging from 0 to f, hexidecimal, right?.
Char(32) seems too expensive.
Any kind of help would be appreciated. Thank you!
In almost all cases you shouldn't store this as a string at all. SQL Server has binary and varbinary types.
This string represents a 16-byte binary value. If the expected size is fixed, it can be stored as a binary(16). If the size changes, it can be stored as a varbinary(N) where N is the maximum expected size.
Don't use varbinary(max), that's meant to store BLOBs and has special storage and indexing characteristics.
Storing the string itself would make sense in few cases, eg if it's a hash string used in an API, or it's meant to be shown to humans. In this case, the data will always come as a string and will always have to be converted to a string to be used. In this case the constant conversions will probably cost more than the storage benefits.

How do I index variable length strings, integers, binaries in b-tree?

I am creating a database storage engine (for fun).
I know it uses b-trees (and stuff), but in all of b-tree base examples, it shows that we need to sort keys and then store it for indexing, not for integers.
I can understand sorting, but how to do it for strings, if I have string as a key for indexing?
Ex : I want to index all email addresses in btree , how would I do that ??
It does not matter, what type of data you are sorting. For a B-Tree you only need a comparator. The first value you put into your db is the root. The second value gets compared to the root. If smaller, then continue down left, else right. Inserting new values often requires to restructure your tree.
A comparator for a string could use the length of the string or compare it alphabetically or count the dots in an email behind the at-sign.

Dropping Leading Zeros

I have a form that records a student ID number. Some of those numbers contain a leading zero. When the number gets recorded into the database it drops the leading 0.
The field is set up to only accept numbers. The length of the student ID varies.
I need the field to be recorded and displayed with the leading zero.
If you are always going to have a number of a certain length (say, it will always be 10 characters), then you can just get the length of the number in the database (after it is converted to a string) and then add the appropriate 0's.
However, if this is an arbitrary amount of leading zeros, then you will have to store the content as a string in the database so you can capture the leading zeros.
It sounds like this should be stored as string data. It sounds like the leading zeros are part of the data itself, not just part of it's formatting.
You could reformat the data for display with the leading zeros in it, however I believe you should store the correct form of the ID number, it will lead to less bugs down the road (ex: you forgot to format it in one place but not in another).
There are a few ways of doing this - depending on the answers to my comments in your question:
Store the extra data in the database by converting the datatype from numeric to varchar/string.
Advantages: Very simple in its implementation; You can treat all the values in the same way.
Disadvantage: If you've got very large amounts of data, storage sizes will escalate; indexing and sorting on strings doesn't perform so well.
Use if: Each number may have an arbitrary length (and hence number of zeros).
Don't use if: You're going to be spending a lot of time sorting data, sorting numeric strings is a pain in the ass - look up natural sorting to see some of the pitfalls;
Continue to store the data in the database as numeric but pad the numeric back to a set length (i.e. 10 as I have suggested in my example below):
Advantages: Data will index better, search better, not require such large amounts of storage if you've got large amounts of data.
Disadvantage: Every query or display of data will require every data instance to be padded to the correct length causing a slight performance hit.
Use if: All the output numbers will be the same length (i.e. including zeros they're all [for example] 10 digits); Large amounts of sorting will be necessary.
Add a field to your table to store the original length of the numeric, continue to store the value as numeric (to leverage sorting/indexing performance gains of numeric vs. string) in your new field store the length as it would include the significant zeros:
Advantages: Reduction in required storage space; maximum use of indexing; sorting of numerics is far easier than sorting text numerics; You still get the ability to pad numerics to arbitrary lengths like you have with option 1.
Disadvantages: An extra field is required in your database, so all your queries will have to pull that extra field thus potentially requiring a slight increase in resources at query/display time.
Use if: Storage space/indexing/sorting performance is any sort of concern.
Don't use if: You don't have the luxury of changing the table structure to include the extra value; This will overcomplicate already complex queries.
If I were you and I had access to modify the db structure slightly, I'd go with option 3, sure you need to pull out an extra field to get the length. The slightly increased complexity pays huge dividends in the advantages versus the disadvantages. The performance hit of padding the string back out the correct length will be far superceded by the performance increase of the indexing and storage space required.
I worked with a database with a similar problem. They were storing zip codes as a number. The consequence was that people in New Jersey couldn't use our app.
You're using data that is logically a text string and not a number. It just happens to look like a number, but you really need to treat it as text. Use a text-oriented data type, or at least create a database view that enables you to pull back a properly formatted value for this.
See here: Pad or remove leading zeroes from numbers
declare #recordNumber integer;
set #recordNumber = 93088;
declare #padZeroes integer;
set #padZeroes = 8;
select
right( replicate('0',#padZeroes)
+ convert(varchar,#recordNumber), #padZeroes);
Unless you intend on doing calculations on that ID, its probably best to store them as text/string.
Another option is since the field is an id, i would recommend creating a secondary field for display number (nvarchar) that you can use for reports, etc...
Then in your application when the student id is entered you can insert that into the database as the number, as well as the display number.
An Oracle solution
Store the ID as a number and convert it into a character for display. For instance, to display 42 as a zero-padded, three-character string:
SQL> select to_char(42, '099') from dual;
042
Change the format string to fit your needs.
(I don't know if this is transferable to other SQL flavors, however.)
You could just concatenate '1' to the beginning of the ID when storing it in the database. When retrieving it, treat it as a string and remove the first char.
MySQL Example:
SET #student_id = '123456789';
INSERT INTO student_table (id,name) VALUES(CONCAT('1',#student_id),'John Smith');
...
SELECT SUBSTRING(id,1) FROM student_table;
Mathematically:
Initially I thought too much and did it mathematically by adding an integer to the student ID, depending on its length (like 1,000,000,000 if it's 9 digits), before storing it.
SET #new_student_id = ABS(#student_id) + POW(10, CHAR_LENGTH(#student_id));
INSERT INTO student_table (id,name) VALUES(#new_student_id,'John Smith');

Resources