REDIS cannot query for specific criteria - c

I to have store the below values:
subId 10 Recipient 999999999999 file /home/sach/ status 1.
I used REDIS to store these values. Ex:
HMSET 1 subId 10 Recipient 999999999999 file /home/sach/ status 1
But with REDIS I can't query for a specific criteria as REDIS can be queried only with the key fields. For example I need query only for the Recipient 988888888888, but REDIS lacks this kind of querying.
Is there any other simple Databases except Mongo and Mysql where I can store these type of values?

With Redis, you just have to manually handle the secondary indexes, by maintaining set or hash objects.
When you add an object, pipeline the following queries:
HMSET 1 subId 10 Recipient 999999999999 file /home/sach/ status 1
SADD subId:10 1
SADD Recipient:999999999999 1
SADD file:/home/sach/ 1
SADD status:1 1
If you need to query the items for a given subId and recipient:
SINTER subId:10 Recipient:999999999999
Then you just need an extra roundtrip to fetch the data corresponding to the returned id.
Actually, many distributed NoSQL stores, except the ones which are pure key/value (memcached for instance), can handle secondary indexes (manually or automatically): Couchbase, CouchDB, Cassandra, Riak, Neo4j, OrientDB, Hyperdex, ArrangoDB, etc ...

Related

In Snowflake ACCOUNT_USAGE.METERING_HISTORY view will the ENTITY_ID field for WAREHOUSE_METERING entries always equal the WAREHOUSE_ID?

The documentation describes ENTITY_ID as "Internal/system-generated identifier for the service type."
From a cursory analysis it appears that this does map to WAREHOUSE_ID for single cluster warehouses. Is this also the case for multi-cluster warehouses? e.g. If a multi-cluster warehouse is running will there be two entries in the METERING_HISTORY view for that hour with ENTITY_ID = WAREHOUSE_ID or one entry that combines the values? If it is just one entry, what would the value of ENTITY_ID be? The first?
A warehouse is considered to be a single entity regardless of the number of clusters it contains, so it only has one ENTITY_ID which maps to WAREHOUSE_ID. There will not be a separate entry in METERING_HISTORY for each cluster in the warehouse.

Modelling Cassandra Data Struncture for both real-time and search from all?

My project serves both real-time data and past data. It works like feed, so it shows real-time data through socket and past data(if scroll down) through REST api. To get real-time data efficiently, I set date as partition key and time as clustering key. For real-time service, I think this data structure is well modeled. But I also have to get limited number of recent datas(like pagination), which should able to show whole data if requested. To serve data like recent 0~20 / 20~40 / 40~60 through REST api calls, my data-serving server has to remember what it showed before to load next 20 datas continuously, as bookmark. If it was SQL I would use IDs or page&offset things but I cannot do that with Cassandra. So I tried:
SELECT * FROM examples WHERE date<='DATEMARK' AND create_at < 'TIMEMARK' AND entities CONTAINS 'something' limit 20 ALLOW FILTERING;
But since date is the partition key, I cannot use comparison operation >, <. The past data could be created very far from now.
Can I satisfy my real-time+past requirements with Cassandra? I wonder if I have to make another DB for accessing past data.
Yes you can, but you must change your mindset and think like NoSQL patterns, in this scenarios you can save your data in duplicate manner and save your data in other table with another partition key and cluster column that satisfies your needs.
we have been using Cassandra extensively for showing real-time + past data. I request you not to use allow filtering option in Cassandra as it's not a good practice. Try to make your schema properly such that you need not required to jump the columns. Suppose you have a schema:
Created_date | Created_time | user_id | Country | Name | Activity
In this schema, you are considering Created_date,created_time,user_id, country as a primary key but you want the user_id of a particular country. In this case, even though you have considered Country column as a primary key you can't query like:
"SELECT * from table where Created_date='2020-02-14' and Country ='india' allow filtering ";
If your query in this pattern you will lose data in your resultset and will get errors when working with big data. Or you'll be using the allow filtering option which is not suggested. So, you need to change the structure of your schema.
Created_date | Country | City | Created_time | user_id | Name | Activity
"SELECT * from table where created_date='2020-02-14' and country='india'";
Using this structure will give you a very consistent result and you will never face any errors. Suppose you want to get all the data for the last seven days. In that case use loop and traverse through the results of each day and store it into some data structure. Hope you understand.

Separate tables vs. one big one in Azure table storage (NoSQL)

I'm using Azure Tables and I'm trying to figure out how I should organize my data.
Each entity in a table has a PartitionKey and a RowKey and my understanding is that partitions should be used to organize similar objects for scalability. In the example on the site they used movie entities where the category (action, sci fi, etc) is the PartitionKey while the title (fast and the furious, etc.) is the RowKey.
Going with the above example, lets say we have no duplicate movies and you also wanted to keep track of each particular movies rental history, i.e location, due date, customer, etc.
Would it be bad practice to have one table to store all of this and use a separate partition for the rental entities? To be clear, I'm talking about a movie item and its corresponding history items together in the same denormalized table in separate partitions.
Would there be an advantage to using two separate tables and if not, then what is the point of tables?
EDIT:
PartitionKey| RowKey | prop0 | prop1 |...
------------------------------------------------...
SciFi | StarWars| foo0: bar0 | foo1: bar1|...
Rental | StarWars| foo0: bar0 | foo1: bar1|...
First of all the notion of table storage is that you can "dump" large amounts of data knowing that search facilities are very poor, you won't be able to issue SQL queries, hence no RDBMS and that it is a means of storing much data.
Indeed, partitionKey and rowKey are the only indexed columns you get with Azure storage, this means that searching by partitionKey or rowKey will be quicker than searching by any other column. If you need to retrieve data swiftly then blob storage or table storage is a no-go. If you just want to keep a record for audit purposes or history purposes then yes.
However, if you want to use it in a video store for instance and need to retrieve client's details, then as I mentioned it's bad practice. You'd better of using RDBMS. Lastly, you can't do JOIN or other RDBMS queries etc in table storage i.e. between two tables.

Perplexing Complexity - Tricky Table Join

I've been struggling with this specific set of tables for some time. There are three tables (used to be two, but it was necessary to add a third). The tables are Request, PartInfo, and Status. Request is used by the customer via a form that enters data into the table. Status is for our service agents to keep track of progress on customer requests. PartInfo is the new table, containing common data accessed by both parties.
The trick is that with each request, there is a running log of changes to that request which are stored in the same table, and linked to the original request in that series via a self-joining key called FirstRequestID (which I'll abbreviate as FID). The same is true for the Status table. Here is my basic table structure as I have currently designed it (Note: It's not too late to change the architecture if there is a better approach):
Request PartInfo Status
------- -------- ------
ID ID ID
FID FID FID
PartInfoID PartNum PartInfoID
ProductID Revision StatusID
CategoryID Description Comments
Now say I want to display the information on a particular request (including part info and status changes) in a ASP.NET GridView table. The "particular request" is identified by the FID.
Question:
How can I ensure that when I'm looking at either the Request history or the Status history, it's always pulling the proper information from the PartInfo (shared) table? In other words, what's the best way to link these three tables with the proper relationship without having 50 different junction tables to account for all the exceptions?
I apologize, but my first take on this schema was “this is a mess”. This data needs to be normalized. Unfortunately, there’s not enough information here to determine how to best do so. Based on your descriptions and part names, I come up with the following ideas.
The main entity is the Request.
Reqeusts contain Products
Requests contain Categories (unless the Category is an attribute of a Product)
Products contain Parts (unless it’s Categories that contain Parts)
The implication is that a Requested Product is associated with an arbitrary number of Parts (as opposed to a standardized set of Parts for that Product).
Status is used to track change in state of a Request over time (and is not completely dependant upon Products, Categories, or Parts)
This suggests the following tables
REQUESTS
RequestId
DateTimeCreated
PRODUCTS
ProductId
-- Add CategoryId, if it’s a Product attribute
CATEGORIES
CategoryId
REQUESTPRODUCTS
RequestProductId
RequestId
ProductId
DateTimeAdded
-- Add StatusId if a status entry must be made every time a product is requested
-- Note extra surrogate key. ReqeustId + ProductId + DateTimeAdded should be the
-- natural key, unless two identical products can be requested at the same time
-- (in which case add an “Quantity” column)
REQUESTCATEGORIES
RequestId
CategoryId
DateTimeAdded
-- Suorrogate key optional, as it’s not referenced by other tables
-- Drop, if categories are product attributes
PARTS
PartId
REQUESTPRODUCTPARTS
RequestProductId
PartId
-- Add StatusId if a status entry must be made every time a part is requested
STATUS
StatusId
RequestId
DateTimeAdded
Comments
There’s a log of ways this could go. You may end up with a lot of “junction” tables, but then your data will have referential integrity and accurate SQL queries become much, much simpler to write.

Should I use multiple fields or JSON?

I'm designing a DB and would like to know the best way to store this:
I have an user table and let's say it can have 100 item slots, and I store an ID.
Should I use JSON ({slot1:123232,slot20:123123123,slot82:23123}) or create 100+ fields (slot1, slot2, slotn)?
Thank you.
Third alternative, create another table for slots, and have a one-to-many relationship between users and slots. Your business logic would enforce the 100 slot limit.
I would recommend against doing the embedded JSON in the database. I'm not sure what DB you are using, but it will likely be very difficult to query the actual slot data for a given user without pulling out and parsing all 100 records.
To create a one-to-many relationship, you'll need a second table
Slots
id (primary key)
user_id (mapping to user table)
item_id (your slot # you want to store)
Now, you can do useful SQL queries like
SELECT * FROM Users,Slots WHERE Slots.user_id = Users.id AND Slots.item_id = 12345
Which would give you a list of all users who have slot item #12345
With database design normalization, you should not have multivalued attributes.
You might want this.
Users
=====
UserId
UserSlots
=========
UserId
SlotId
Value
Slots
=====
SlotId
Value
You should not create 100 fields.
Create a table with two fields, the ID and your "JSON Data", which you could store in a string or other field type depending on size.
You could normalize it as others have suggested, by that would increase your save and retrieve time.

Resources