Table Schema
Message_id | Sent_dt | Status | Status_dt
Message_id | Sent _dt - Available on sending Message to provider
Message_id | Status | Status_dt - Available on receiving Status from
provider
In Cassandra Sent_dt is the clustering key and is mandatory for updating a record.
Introduced intermediate MongoDB as follows:
Message stored and updated with Status on MongoDB
Live reporting done from MongoDB to UI
Records are copied from MongoDB to Cassandra on Sent_dt+24 hours
Records on MongoDB are deleted on Sent_dt+72 hours.
How to handle “Status” updates for Records with:
Sent_dt between 24 and 72 hours? (Already copied to Cassandra)
Sent_dt more than 72 hours? (Already deleted from MongoDB)
Is there any alternative design for the scenario? I am consuming data from Kafka only.
Related
My project serves both real-time data and past data. It works like feed, so it shows real-time data through socket and past data(if scroll down) through REST api. To get real-time data efficiently, I set date as partition key and time as clustering key. For real-time service, I think this data structure is well modeled. But I also have to get limited number of recent datas(like pagination), which should able to show whole data if requested. To serve data like recent 0~20 / 20~40 / 40~60 through REST api calls, my data-serving server has to remember what it showed before to load next 20 datas continuously, as bookmark. If it was SQL I would use IDs or page&offset things but I cannot do that with Cassandra. So I tried:
SELECT * FROM examples WHERE date<='DATEMARK' AND create_at < 'TIMEMARK' AND entities CONTAINS 'something' limit 20 ALLOW FILTERING;
But since date is the partition key, I cannot use comparison operation >, <. The past data could be created very far from now.
Can I satisfy my real-time+past requirements with Cassandra? I wonder if I have to make another DB for accessing past data.
Yes you can, but you must change your mindset and think like NoSQL patterns, in this scenarios you can save your data in duplicate manner and save your data in other table with another partition key and cluster column that satisfies your needs.
we have been using Cassandra extensively for showing real-time + past data. I request you not to use allow filtering option in Cassandra as it's not a good practice. Try to make your schema properly such that you need not required to jump the columns. Suppose you have a schema:
Created_date | Created_time | user_id | Country | Name | Activity
In this schema, you are considering Created_date,created_time,user_id, country as a primary key but you want the user_id of a particular country. In this case, even though you have considered Country column as a primary key you can't query like:
"SELECT * from table where Created_date='2020-02-14' and Country ='india' allow filtering ";
If your query in this pattern you will lose data in your resultset and will get errors when working with big data. Or you'll be using the allow filtering option which is not suggested. So, you need to change the structure of your schema.
Created_date | Country | City | Created_time | user_id | Name | Activity
"SELECT * from table where created_date='2020-02-14' and country='india'";
Using this structure will give you a very consistent result and you will never face any errors. Suppose you want to get all the data for the last seven days. In that case use loop and traverse through the results of each day and store it into some data structure. Hope you understand.
I have overheard discussion like "Should this be put into a ledger instead of an update?" I have some feeling that is has to do with record keeping, but I cannot fully understand what is a ledger. Searching both on stackoverflow and google runs into accounting related articles.
So my question is, what is a ledger when talking about database applications?
A ledger usually refers to a collection of states through which an entity passed. The difference between an update and storing the data in a ledger is that when using update you don't have a history of all the updates performed on a certain entity.
The most common example for ledgers is indeed a banking model. You can better see the difference in the example below:
With updates, every time a client withdraws or deposits money, you just update the amount of money that client owns:
user_id | ammount
-----------------------------
26KRZT | 45
Having a ledger, you can keep the entire history of the transactions (and compute the amount based on the client's transactions)
user_id | operation | ammount
----------------------------------------------------
26KRZT | DEPOSIT | 25
26KRZT | DEPOSIT | 35
26KRZT | WITHDRAW | 15
Basically, a ledger stores data in the database as diffs (updates to the previous version of an entity) in order to be able to get a change history for a given entity.
(Before that, i apologize for my bad English)
I have study cases like this:
I am currently having a trouble with my Web Application. I made a Web application for a certain company. I made the app using CodeIgniter 3.
I built the database using Maria DB. For the id in each table, i am using Auto-increment id for my application database for each table. I usually deploy the web app to the cloud server (sometimes the company have their own dedicated server, but sometimes haven't ). One day, there is a company that they don't want to deploy the app that i have made before to the cloud ( for the security purposes they said ).
This company wanted to deploy the app to the employee's PC personally in the office, while the pc for each employee not connected to each other ( i.e stand alone pc/personal computer/employee's Laptop ). They said, for every 5 months, they would collect all of the data from employee's personal computer to company's data center, and of course the data center are no connected to the internet. I told them that's not the good way to store their data. ( because the data will be duplicate when i am trying to merge all of the data into one, since my column id for every table are in auto-increment id, and it's a primary key). Unfortunately, The company still want to kept the app that way, and i don't know how to solved this.
They have at least 10 employees that would used this web app. According that, I have to deploy the app to the 10 PC personally.
Additional info : Each employee have their own unique id which they got from the company, and i made the auto_increment id for each employee, just like the table below:
id | employee_id | employee_name |
1 | 156901010 | emp1
2 | 156901039 | emp2
3 | 156901019 | emp3
4 | 156901015 | emp4
5 | 156901009 | emp5
6 | 156901038 | emp6
The problem is whenever they fill the form from that application, some of the table are not stored the employee's id but the new id that come from increment id.
For example electronic_parts table. They have the attribute like below:
| id | electronic_part_name | kind_of_electronic_part_id |
if the emp1 fill the form from the web app , the table's content would like below.
| id | electronic_part_name | kind_of_electronic_part_id |
| 1 | switch | 1 |
and if the emp2 fill the form from the web app , the table's content would like below.
| id | electronic_part_name | kind_of_electronic_part_id |
| 1 | duct tape | 10 |
When i tried to merge the contents of the table into the data center it would falling apart because the duplicate id.
It's getting worst when i think about my foreign key in other tables.. like for example the customer_order table.
The table for customer_order column looks like below (just a sample, not the actual table, but similar).
|id | customer_name | electronic_parts_id | cashier(a.k.a employee_id, the increment id one, not the id that employee got from a company as i described above ) |
| 1 | Henry | 1 | 10 |
| 2 | Julie | 2 | 9 |
Does anyone know how to solved this problem ? or can someone suggest/recommend me some good way to solved this ?
NOTE: Each Employees have their own database for their app, so the database is not centralized, it's a stand-alone database, that means, i have to installed the database to the employee's pc one by one
This is an unconventional situation and you can have an unconventional solution.
I can suggest you two methods to solve this issue.
Instead of using autoincrement for primary key generate a UUID and use it as the primary key. Regarding the probability of duplicates
in random UUIDs: Only after generating 1 billion UUIDs every second
for the next 100 years
In CodeIgniter you could do this with the following code snippet.
$this->db->set('id', 'UUID', FALSE);
This generates a 36 characters hexadecimal key (with 4 dashes
included).
ac689561-f7c9-4f7e-be94-33c6c0fb0672
As you can see it has dashes in the string, using the CodeIgniter DB
function will insert this in the database with the dashes, it still
will work. If it does not look at clean, you could remove and
convert the string to a 32-char key.
You can use the following function with the help of [CodeIgniter
UUID library][1].
function uuid_key {
$this->load->library('uuid');
//Output a v4 UUID
$id = $this->uuid->v4();
$id = str_replace('-', '', $id);
$this->db->set('id', $id, FALSE);
}
Now we have a 32-byte key,
ac689561f7c94f7ebe9433c6c0fb0672
An alternate unconventional method to tackle the situation is by
adding function to log all Insert, Update, Delete queries processed
in the site to a file locally. By this way, in each local
implementation will generate a log file with an actual list of
queries that modify the DB over time in the right sequential order.
At any point in time, the state of the database is the result of the
set of all those queries happened in the past till that date.
So in every 5 months when you are ready to collect data from
employees personal computer, instead of taking data dump, take this
file with all query log.(Note: Such a query log won't have
auto-increment id as it will be created only in the real time when
it is executed towards a Database. )
Use such files to import data to your datacenter. This will not
conflict as it will generate autoincrements in your data center in
real time. (Hope you do not have to link your local to data center
at any point of time in future)
[1]: https://github.com/Repox/codeigniter-uuid
Is that id used in any other tables? It would probably be involved in a JOIN. If so, you have a big problem of unraveling the ids.
If the id is not used anywhere else, then the values are irrelevant, and the rows can be renumbered. This would be done (roughly speaking) by loading the data from the various sources into the same table, but not include the id in the load.
Or, if there is some other column (or combination of columns) that is UNIQUE, then make that the PRIMARY KEY and get rid of id.
Which case applies? We can pursue in more detail. Please provide SHOW CREATE TABLE for any table(s) that are relevant.
In my first case (where id is used as a FK elsewhere), do something like this:
While inserting the rows into the table with id, increment the values by enough to avoid colliding with the existing ids. Then do (in the same transaction):
UPDATE the_other_table SET fk_id = fk_id + same_increment.
Repeat for each other table and each id, as needed.
I think your problem come from your database... you didn't design it well.
it's a bug if you have an id for two difference users .
if you just made your id field unique in your database then two employee wouldn't have a same id so your problem is in your table design .
just initiate your id field like this and your problem will be solved .
CREATE TABLE [YOUR TABLE NAME](
[ID] int NOT NULL IDENTITY(1,1) PRIMARY KEY,
....
Is it required for the id to be an integer? if not may be you can use a prefix on the id so the input for each employee will be unique in general. that means you have to give up the auto increment and just do count on the table data (assuming youre not deleting any of the records.)
You may need to write a code in PHP to handel this. If other table is already following unique/primary key based than it is fine.
You can also do it after import.
like this
Find duplicates in the same table in MySQL
I to have store the below values:
subId 10 Recipient 999999999999 file /home/sach/ status 1.
I used REDIS to store these values. Ex:
HMSET 1 subId 10 Recipient 999999999999 file /home/sach/ status 1
But with REDIS I can't query for a specific criteria as REDIS can be queried only with the key fields. For example I need query only for the Recipient 988888888888, but REDIS lacks this kind of querying.
Is there any other simple Databases except Mongo and Mysql where I can store these type of values?
With Redis, you just have to manually handle the secondary indexes, by maintaining set or hash objects.
When you add an object, pipeline the following queries:
HMSET 1 subId 10 Recipient 999999999999 file /home/sach/ status 1
SADD subId:10 1
SADD Recipient:999999999999 1
SADD file:/home/sach/ 1
SADD status:1 1
If you need to query the items for a given subId and recipient:
SINTER subId:10 Recipient:999999999999
Then you just need an extra roundtrip to fetch the data corresponding to the returned id.
Actually, many distributed NoSQL stores, except the ones which are pure key/value (memcached for instance), can handle secondary indexes (manually or automatically): Couchbase, CouchDB, Cassandra, Riak, Neo4j, OrientDB, Hyperdex, ArrangoDB, etc ...
I want to write a timer that counts up, no preference of code, that has controls to start and stop the timer on my server, but displays the time on the clients computer. I can offer more information if needed.
Thanks!
What you need is a database. I'd go with MySQL (http://dev.mysql.com/downloads/mysql/), since it's good and free. If you're paying for hosting you might already have access to a database though.
Then you need some tables to store the necessary info. How you create them depends on your database, but I'll make a rough outline.
You'd typically have a customer table with info about your customer:
| Customer Id | Customer name | Contact person | Phone | E-mail |
Then a table for each of the project your doing for your customer (here you'll have a foreign key to the customer table):
| Project Id | Customer Id | Cost per hour | Estimated hours | Start date | Finish date |
And here's the table that will be updated whenever you start or stop working on the project. There will be a new row in this table every time you "stop the timer" on the project (project Id is a foreign key to the previous table. Customer id is optional, since you can get at the customer through the second table):
| Session Id | Project Id | Customer Id | Start | Stop |
Here "start" and "stop" are timestamps. Session id is an auto incremented id. Each time you start the timer, that corresponds to inserting a new row into the table with the current time in the start field. Each time you stop the timer, that corresponds to setting the current time in the stop field for the only row with the current project where the stop date is null.
When the customer wants to know the total time spent on the project so far, that's a matter of summing all the intervals (stop - start) on the projects.
To make use of any of this, you need to make a framework in some kind of programming language. I prefer perl myself, but php is probably your best bet, since it's well suited for these kinds of things.
It's hard to go into more specifics until you've made some design choices, but I hope this is enough to give you a general idea of how you can implement it.