How don't select the same row twice in two select cassandra queries? - database

I am working on a social networking project with cassandra. Users can subscribe to a profile and have access to the list of people who have subscribed to that same profile. My goal is to retrieve in a table called user_follows the list of people subscribed to a profile.
CREATE TABLE users_follows (to_id text, from_id text, followed_at timestamp, PRIMARY KEY(to_id, from_id))
The problem is that some profiles can have thousands of subscribers and I don't want to get them all at once. That's why I'd like to get the list in increments of 20 depending on how far down the user goes. My problem is that I can't see how to retrieve the other parts of the list after the first select because Cassandra always returns the same users.
SELECT * FROM users_follows where to_id = 'xxxxx'
A possible solution was to sort with a timestamp but in case I want to retrieve the list of people to whom a user is subscribed (the reverse query) this would not work. One solution would be to use materialized views but I'm not sure that it would be very optimal given the size of the table. Or to create a different table, one user_follows and another user_followers, but I don't think this is very recommended....

Related

Databases - Do tables in a database have to contain all global instances or can they be specific to a certain ID

I'm trying to model a database for facebook users. For my User table I have {account_name, account_id, ..., password} as variables. I'm trying to create a table for friends of that user, such as a Friendlist or Contacts table, but I'm wondering would the contacts table have to have a list of all the possible friendships (i.e. {user_id, friend_id, friends_since,...} ) or is it possible to have the contact table specific to a certain user_id (so {contact_id, friends_since, no_of_mutual_contacts, etc..}).
This is probably just the object orientated part of my brain coming into action, but say for example if we were using a Cinema database. There were 5 cinemas and each had 5 screens, CINEMA is a table and SCREEN is a table. Would the SCREEN table have 25 instances of data or is there some way that the SCREEN table would only contain the 5 screens corresponding to a specific CINEMA.
This comes back to my original question, would the Contacts table have to be a global table containing all friendships, or does the Contacts table only return the contacts specific to the user_id.
For some reason when I'm looking at the relational chart of the database I see the connections ressembling a somewhat object like structure where the links are just denoted members like user.Contacts or user.Events.
Normally a table in a relational database will record all facts of a given type, e.g. user <user_id> is friends with user <friend_id> since <friends_since>.
You can create views for given users if required, or just specify a user_id in your queries, to get the subset of rows applicable to a specific user.
Creating a physical table specific to certain users would create difficulties if you wanted to get all the rows for a certain friend_id.

Oracle APEX - Data Modeling & Primary Keys

I'm creating a rather large APEX application which allows managers to go in and record statistics for associates in the company. Currently we have a database in oracle with data from AD which hold all the associates information. Name, Manager, Employee ID, etc.
Now I'm responsible for creating and modeling a table that will house all their stats for each employee. The table I have created has over 90+ columns in it. Some contain data such as:
Documents Processed
Calls Received
Amount of Doc 1 Processed
Amount of Doc 2 Processed
and the list goes on for well over 90 attributes. So here is my question:
When creating this table in my application with so many different columns how would I go about choosing a primary key that's appropriate? Should I link it to our employee table using the employees identification which is unique (each have a associate number)?
Secondly, how can I create these tables (and possibly form) to allow me to associate the statistic I am entering for an individual to the actual individual?
I have ordered two books from amazon on data modeling since I am new to APEX and DBA design. Not a fresh chicken, but new enough to need some guidance. An additional problem I am running into is that each form can have only 60 fields to it. So I had thought about creating tables for different functions out of my 90+ I have.
Thanks
4.2 allows for 200 items per page.
oracle apex component limits
A couple of questions come to mind:
Are you sure that the employee Ids are not recyclable? If these ids are unique and not recycled.. you've found yourself a good primary key.
What do you plan on doing when you decide to add a new metric? Seems like you might have to add a new column to your rather large and likely not normalized table.
I'd recommend a vertical table for your metrics.. you can use oracle's pivot function to make your data appear more like a horizontal table.
If you went this route you would store your employee Id in one column, your metric key in another, and value...
I'd recommend that you create a metric table consisting of a primary key, a metric label, an active indicator, creation timestamp, creation user id, modified timestamp, modified user id.
This metric table will allow you to add new metrics, change the name of the metric, deactivate a metric, and determine who changed what and when.
This would be a much more flexible approach in my opinion. You may also want to think about audit logs.

Design for storing recent actions and recently met people

I was wondering how to set up a database for storing actions people recently done when they travel. For example, if they go to a museum, the database will store this text "Bob went to this museum" and store the user id and timestamp. I was wondering if these events should be stored in just one table, and if I want the events of a single person I will just query this table with a user id.
On a similar note I want to store 50 users the user has "recently met" meaning the last 50 users the userhas been around in their travels. I was thinking this could be stored in one table as well, with just user IDs being paired with no duplicates. I'm just afraid the table might get too big.
Any suggestions on table set up?
Thanks
Personally I would go with an ER structure like this:

Better design for blocklists

I have 10-12 items which i need to maintain a blocklist for on my system. Which design is better? These are sample columns, much more items to block.
table 1
b_id
b_email
b_name
b_username
b_pagename
b_word
b_IP
comments
table 2
b_id
b_type
text
comments
Basically in table 1, each blocked item is a value in 1 column only, rest are all NULL.
In table 2, each blocked item resides in the only column so there are no NULLs
There are other designs possible too like separate tbl for each item but then there will be lots of tbs just to hold blocklists.
EDIT: The use of this data is to block users from performing certain activities. Each blocked item is used in differnt places. Example:
block_IP = list of IP addresses that the website will block based on detected user's IP
block_name = list of restricted first/last names users cannot use to signup with
block_email = list of restricted emails users cannot use to signup with
block_username = list of restricted usernames users cannot use to get a profile name
block_pagenames = list of restricted page names users cannot create
block_word = abusive words which users cannot use within content of comments, blogs, etc.
and the list goes on...
So basically these are all like individual lookup items. In an ideal world we would have separate tables for each item. But I dont like to idea of having 20-30 tables just to hold blocked items values. Should be an easier way to manage all this. Only issue is some items like block_Word can grow to millions of rows as there are a lot of words that can be blocked in many languages.
Check out the Entity-Attribute-Value approach or use a schemaless NoSQL datastore.
http://en.wikipedia.org/wiki/Entity-attribute-value_model
If you're processing the 'blocking' in the middle tier, you can just dump the lists as serialized objects (e.g. JSON) into the table.
I assume you're trying to do something like access control lists, which depending on your plaform you might be able to find a plugin for.

how are viewing permissions usually implemented in a relational database?

What's the standard relational database idiom for setting permissions for items?
Answers should be general; however, they should be able to be applied to example below. Anything flies: adding columns, adding another table—whatever as long as it works well.
Application / Example
Assume the Twitter database is extremely simple: we have one User table, which contains a login and user id; we have a Tweet table, which contains a tweet id, tweet text, and creator id; and we have a Follower table, which contains the id of the person being followed and the follower.
Now, assume Twitter wants to enable advanced privacy settings (viewing permissions), so that users can pick exactly which followers can view tweets. The settings can be:
Everyone on Twitter
Only current followers (which would of course have to be approved by the user, this doesn't really matter though) EDIT: Current as in, I get a new follower, he sees it; I remove a follower, he stops seeing it.
Specific followers (e.g., user id 5, 10, 234, and 1)
Only the owner
Under these circumstances, what's the best way to represent viewing permissions? The priorities, in order, are speed of lookup (you want to be able to figure out what tweets to display to a user quickly), speed of creation (you don't want to take forever to post a tweet), and efficient use of space (every time I post a tweet to everyone on my followers' list, I shouldn't have to add a row for each and every follower I have to some table.)
Looks like a typical many-to-many relationship -- I don't see any restrictions on what you desire that would allow space savings wrt the typical relational DB idiom for those, i.e., a table with two columns (both foreign keys, one into users and one into tweets)... since the current followers can and do change all the time, posting a tweet to all the followers that are current at the instant of posting (I assume that's what you mean?) does mean adding that many (extremely short) rows to that relationship table (the alternative of keeping a timestamped history of follower sets so you can reconstruct who was a follower at any given tweet-posting time appears definitely worse in time and not substantially better in space).
If, on the other hand, you want to check followers at the time of viewing (rather than at the time of posting), then you could make a special userid artificially meaning "all followers of the current user" (just like you'll have one meaning "all users on Twitter"); the needed SQL to make the lookup fast, in that case, looks hairy but feasible (a UNION or OR with "all tweets for which I'm a follower of the author and the tweet is readable by [the artificial userid representing] all followers"). I'm not getting deep into that maze of SQL until and unless you confirm that it is this peculiar meaning that you have in mind (rather than the simple one which seems more natural to me but doesn't allow any space savings on the relationship table for the action of "post tweet to all followers").
Edit: the OP has clarified they mean the approach I mention in the second paragraph.
Then, assume userid is the primary key of the Users table, the Tweets table has a primary key tweetid and a foreign key author for the userid of each tweet's author, the Followers table is a typical many-to-many relationship table with the two columns (both foreign keys into Users) follower and followee, and the Canread table a not-so-typical many-to-many relationship table, still with two column -- foreign key into Users is column reader, foreign key into Tweets is column tweet (phew;-). Two special users #everybody and #allfollowers are defined with the above meanings (so that posting to everybody, all followers, or "just myself", all add only one row to Canread -- only selective posting to a specific list of N people adds N rows).
So the SQL for the set of tweet IDs a user #me can read is, I think, something like:
SELECT Tweets.tweetid
FROM Tweets
JOIN Canread ON(Tweets.tweetid=Canread.tweet)
WHERE Canread.reader IN (#me, #everybody)
UNION
SELECT Tweets.tweetid
FROM Tweets
JOIN Canread ON(Tweets.tweetid=Canread.tweet)
JOIN Followers ON(Tweets.author=Followers.followee)
WHERE Canread.reader=#allfollowers
AND Followers.follower=#me

Resources