Better design for blocklists

Better design for blocklists - database

I have 10-12 items which i need to maintain a blocklist for on my system. Which design is better? These are sample columns, much more items to block.
table 1
b_id
b_email
b_name
b_username
b_pagename
b_word
b_IP
comments
table 2
b_id
b_type
text
comments
Basically in table 1, each blocked item is a value in 1 column only, rest are all NULL.
In table 2, each blocked item resides in the only column so there are no NULLs
There are other designs possible too like separate tbl for each item but then there will be lots of tbs just to hold blocklists.
EDIT: The use of this data is to block users from performing certain activities. Each blocked item is used in differnt places. Example:
block_IP = list of IP addresses that the website will block based on detected user's IP
block_name = list of restricted first/last names users cannot use to signup with
block_email = list of restricted emails users cannot use to signup with
block_username = list of restricted usernames users cannot use to get a profile name
block_pagenames = list of restricted page names users cannot create
block_word = abusive words which users cannot use within content of comments, blogs, etc.
and the list goes on...
So basically these are all like individual lookup items. In an ideal world we would have separate tables for each item. But I dont like to idea of having 20-30 tables just to hold blocked items values. Should be an easier way to manage all this. Only issue is some items like block_Word can grow to millions of rows as there are a lot of words that can be blocked in many languages.

Check out the Entity-Attribute-Value approach or use a schemaless NoSQL datastore.
http://en.wikipedia.org/wiki/Entity-attribute-value_model
If you're processing the 'blocking' in the middle tier, you can just dump the lists as serialized objects (e.g. JSON) into the table.
I assume you're trying to do something like access control lists, which depending on your plaform you might be able to find a plugin for.

Related

How don't select the same row twice in two select cassandra queries?

I am working on a social networking project with cassandra. Users can subscribe to a profile and have access to the list of people who have subscribed to that same profile. My goal is to retrieve in a table called user_follows the list of people subscribed to a profile.
CREATE TABLE users_follows (to_id text, from_id text, followed_at timestamp, PRIMARY KEY(to_id, from_id))
The problem is that some profiles can have thousands of subscribers and I don't want to get them all at once. That's why I'd like to get the list in increments of 20 depending on how far down the user goes. My problem is that I can't see how to retrieve the other parts of the list after the first select because Cassandra always returns the same users.
SELECT * FROM users_follows where to_id = 'xxxxx'
A possible solution was to sort with a timestamp but in case I want to retrieve the list of people to whom a user is subscribed (the reverse query) this would not work. One solution would be to use materialized views but I'm not sure that it would be very optimal given the size of the table. Or to create a different table, one user_follows and another user_followers, but I don't think this is very recommended....

NetSuite - UNION ALL equivalent in saved search?

I'm in the process of writing a SuiteTalk integration, and I've hit an interesting data transformation issue. In the target system, we have a sort of notes table which has a category column and then the notes column. Data going into that table from NetSuite could be several different fields on a single entity in NetSuite terms, but several records of different categories in our terms.
If you take the example of a Sales Order, you might have two text fields that we need to bring across as notes. For each of those fields I need to create a row, with both the notes field in the same column but separate rows. This would allow me to add a dynamic column that give the category for each of those fields.
So instead of
SO number notes 1 notes 2
SO1234567 some text1 some text2
You’d get
SO Number Category Text
SO1234567 category 1 some text1
SO1234567 category 2 some text2
The two problems I’m really trying to solve here are:
Where can I store the category name? It can’t be the field name in NetSuite. It needs to be configurable per customer as the number of notes fields in each record type might vary across implementations. This is currently my main blocker.
Performance – I could create a saved search for each type of note, and bring one row across each time, but that’s not really an acceptable performance hit if I can do it all in one call.
I use Saved Searches in NetSuite to provide a configurable way of filtering the data to import into the target system.
If I were writing a SQL query, i would use the UNION clause, with the first column being a dynamic column denoting the category and the second column being the actual data field from NetSuite. My ideal would be if I could somehow do a similar thing either as a single saved search, or as one saved search per entity, without having to create any additional fields within NetSuite itself, so that from the SuiteTalk side I can just query the search and pull in the data.
As a temporary kludge, I now have multiple saved searches in NetSuite, one per category, and within the ID of the saved search I expect the category name and an indicator of the record type. I then have a parent search which gives me the searches for that record type - it's very clunky, and ultimately results in far too many round trips for me to be satisfied.
Any idea if something like this is at all possible?? Or if not, is there a way of solving this without hard-coding the category values in the front end? Even if I can bring back multiple recordsets in one call, that would be a performance enhancement.
I've asked the same question on the NetSuite forums but to no avail.
Thanks

At first read it sounds like you are trying to query a set of fields from entities. The fields may be custom fields or built in fields. Can you not just query the entities where your saved search has all the potential category columns and then transform the received data into categories?
Otherwise please provide more specifics in Netsuite terms about what you are trying to do.

filemaker database relationships

I'm very new to FileMaker currently working on a Mac. I've been assigned a new simple system to work towards completing and I have bumped into some issues with database relationships. I've got experience with PHP/MySQL databases connections etc. but FileMaker seems to require a somewhat different mindset and approach.
I'll try to explain this as simply as I can.
Here's the table relationships in my database
What I'm trying to do is a list of "to-do" notes, an interactive menu where the user can add things that needs to be done. I've done this with a portal on a layout based on the table "site". The portal is based on the table "todo_notes", which is connected to site through the "site_id".
Here's what it looks like in browse mode
What I'm having problems with is adding a relationship between the todo_notes and contacts. The contacts are two separate tables called "county_contacts" and "property_owner_contacts". What I want to accomplish is the possibility for the user to, from a dropdown-list, add a single contact from these two tables. Preferably I'd like to sort of merge these two tables into the same dropdown-list.
Let me know if you need any other information or a better explanation of my issue. Any help is very welcome!

If you have a single contacts table with foreign keys for both county and property owner tables, that would let you have a single list for all contacts. From there you could also build a value list based on a relationship, for example to filter only contacts that belong to either county or property owners.
If you then need to further normalize the tables, fields that pertain to either relationship exclusively could be moved to another table from there, as a one to one relationship, if that is a concern.

The Short Answer
You need to create a Contacts table. Filemaker has no way of dynamically generating value lists. Instead, you can base a value list on any field, therefore, the only way of generating a list of the contact names would be if they were all in the same table.
The Long Answer
Because Filemaker only allows us to use ONE field for a value list, we must create a new table for the contact. I would recommend that you replace the two contact tables with a single contact table,(seeing as the fields look the same between the two tables) and then add a toggle on the contact for Owner or County. However, you could also create a single contact table for all of the fields that overlap that has foreign keys to the owner and county tables.
You would then use the fullname field from the contact and be good to go.
That is, assuming that you did not want to filter the contacts at all or only show contacts associated with this site.
To start with, I highly recommend using the Anchor-buoy method for organizing the relationship graph. Here's an explanation of the anchor-buoy method: http://sixfriedrice.com/wp/six-fried-rice-methodology-part-2-anchor-buoy-and-data-structures/ . It's just a convention, but will help you with the idea of context in FileMaker. It's widely accepted among the FileMaker community as the "right" way to organize a relationship graph. I will continue my explanation using this method.
Each Table Occurrence (the boxes in the graphs, or TO) represents a unique context from which you can view and edit information. In the anchor buoy method, each Table only has one "anchor" TO. I would recommend only using anchor TO's for the context of your layouts. Then, your portal, and any other corresponding information, will be on your buoy TO's. Here is what your new portal relationship would look like. You would select fields from your buoy TO's to use in the portal.
The easiest way to filter your value list by only contacts associated with this site would be to create a foreign key from the contact table to the site, and then add a TO to the graph, for the contact table. You would then click "Include only related values starting from" radio button, and specify your new TO.

Designing a data model that incorporates logical operators

I am new to data modeling and i'm having trouble coming up with a data model that can store logic.
The data model would be used to store location and marketing attributes.
When a customer visits one of the company's websites, they would enter in their zip code, and based on their location the attributes would be used to arrange the online catalog of items.
The catalog of items would be separate from the database, so the data model would only produce the output of attributes used to arrange the items. Each item in the catalog has attributes such as ItemNumber, Price, Condition, Manufacture, and marketing segments (Age:Adult, Education: College, Income:High, etc.).
**For example:**
**Input zip code**: 90210
**Output Attributes**: (ItemNumber:123456, Segment:HighIncome, Condition:New)
This example is saying for zip 90210, first show item #123456, followed by all of the items with the HighIncome segment, and then display all of the non-refurbished items.
So far I have 2 tables with a many to many relationship and I would like to add an additional table(s) so I can incorporate logic (AND & OR).
The first table would have location and other information about which of the company's site the user is on.
Table Location(
Location_Unique_Identifier number
ZipCode varchar2
State varchar2
Site varchar2
..
)
The second table would have the attributes types (Manufacture, Price, Condition, etc.) and the attribute values (IBM, 10.00, Refurbished, etc.).
Table Attributes(
Attribute_Unique_Identifier number
Attribute_Type varchar2
Attribute_Value varchar2
..
..
)
In-between these two tables to break up the many to many relationship I would add the logic table. This table should allow me to output
item#123456 AND (item#768900 OR Condition:New)
The problem I am having with the logic table is trying to make it flexible enough to handle an unknown amount of AND/ORs and to handle the grouping.

This is a typical scenario of JOIN two( many ) tables together to do AND/OR/XOR or something else logical.
The best choice is to build a meterailized view that denormalize the attributes from multiple tables together into one table(this table is called a view).
In your case, the view may be:
table location_join_attributes{
number,
zipcode,
state,
site,
Manufacture,
Price,
Condition,
......
}
Then you will operate your logical statement on this table/view as(modified from your example):
item#123456 OR (item#768900 AND Condition:New) AND (more condition)
If we do not have this view, this operation will firstly fetch out all the records have item#768900, and then filter among the second table to know which of them have condition:new. It will take a long time to finish. If the condition is complex, the performance is terrible.
For quick query, you should build secondary indexes on the columns you operate.
On the scalability side, if your business logic changes, you may build a new view, and the older one will be discarded. The original tables do not change, which is also one of the advantages of a materialized view has.

One large table or many small ones in database?

Say I want to create a typical todo-webApp using a db like postgresql. A user should be able to create todo-lists. On this lists he should be able to make the actual todo-entries.
I regard the todo-list as an object which has different properties like owner, name, etc, and of course the actual todo-entries which have their own properties like content, priority, date ... .
My idea was to create a table for all the todo-lists of all the users. In this table I would store all the attributes of each list. But the questions which arises is how to store the todo-entries themselves? Of course in an additional table, but should I rather:
1. Create one big table for all the entries and have a field storing the id of the todo-list they belong to, like so:
todo-list: id, owner, ...
todo-entries: list.id, content, ...
which would give 2 tables in total. The todo-entries table could get very large. Although we know that entries expire, hence the table only grows with more usage but not over time. Then we would write something like SELECT * FROM todo-entries WHERE todo-list-id=id where id is the of the list we are trying to retrieve.
OR
2. Create a todo-entries table on a per user basis.
todo-list: id, owner, ...
todo-entries-owner: list.id, content,. ..
Number of entries table depends on number of users in the system. Something like SELECT * FROM todo-entries-owner. Mid-sized tables depending on the number of entries users do in total.
OR
3. Create one todo-entries-table for each todo-list and then store a generated table name in a field for the table. For instance could we use the todos-list unique id in the table name like:
todo-list: id, owner, entries-list-name, ...
todo-entries-id: content, ... //the id part is the id from the todo-list id field.
In the third case we could potentially have quite a large number of tables. A user might create many 'short' todo-lists. To retrieve the list we would then simply go along the lines SELECT * FROM todo-entries-id where todo-entries-id should be either a field in the todo-list or it could be done implicitly by concatenating 'todo-entries' with the todos-list unique id. Btw.: How do I do that, should this be done in js or can it be done in PostgreSQL directly? And very related to this: in the SELECT * FROM <tablename> statement, is it possible to have the value of some field of some other table as <tablename>? Like SELECT * FROM todo-list(id).entries-list-name or so.
The three possibilities go from few large to many small tables. My personal feeling is that the second or third solutions are better. I think they might scale better. But I'm not sure quite sure of that and I would like to know what the 'typical' approach is.
I could go more in depth of what I think of each of the approaches, but to get to the point of my question:
Which of the three possibilities should I go for? (or anything else, has this to do with normalization?)
Follow up:
What would the (PostgreSQL) statements then look like?

The only viable option is the first. It is far easier to manage and will very likely be faster than the other options.
Image you have 1 million users, with an average of 3 to-do lists each, with an average of 5 entries per list.
Scenario 1
In the first scenario you have three tables:
todo_users: 1 million records
todo_lists: 3 million records
todo_entries: 15 million records
Such table sizes are no problem for PostgreSQL and with the right indexes you will be able to retrieve any data in less than a second (meaning just simple queries; if your queries become more complex (like: get me the todo_entries for the longest todo_list of the top 15% of todo_users that have made less than 3 todo_lists in the 3-month period with the highest todo_entries entered) it will obviously be slower (as in the other scenarios). The queries are very straightforward:
-- Find user data based on username entered in the web site
-- An index on 'username' is essential here
SELECT * FROM todo_users WHERE username = ?;
-- Find to-do lists from a user whose userid has been retrieved with previous query
SELECT * FROM todo_lists WHERE userid = ?;
-- Find entries for a to-do list based on its todoid
SELECT * FROM todo_entries WHERE listid = ?;
You can also combine the three queries into one:
SELECT u.*, l.*, e.* -- or select appropriate columns from the three tables
FROM todo_users u
LEFT JOIN todo_lists l ON l.userid = u.id
LEFT JOIN todo_entries e ON e.listid = l.id
WHERE u.username = ?;
Use of the LEFT JOINs means that you will also get data for users without lists or lists without entries (but column values will be NULL).
Inserting, updating and deleting records can be done with very similar statements and similarly fast.
PostgreSQL stores data on "pages" (typically 4kB in size) and most pages will be filled, which is a good thing because reading a writing a page are very slow compared to other operations.
Scenario 2
In this scenario you need only two tables per user (todo_lists and todo_entries) but you need some mechanism to identify which tables to query.
1 million todo_lists tables with a few records each
1 million todo_entries tables with a few dozen records each
The only practical solution to that is to construct the full table names from a "basename" related to the username or some other persistent authentication data from your web site. So something like this:
username = 'Jerry';
todo_list = username + '_lists';
todo_entries = username + '_entries';
And then you query with those table names. More likely you will need a todo_users table anyway to store personal data, usernames and passwords of your 1 million users.
In most cases the tables will be very small and PostgreSQL will not use any indexes (nor does it have to). It will have more trouble finding the appropriate tables, though, and you will most likely build your queries in code and then feed them to PostgreSQL, meaning that it cannot optimize a query plan. A bigger problem is creating the tables for new users (todo_list and todo_entries) or deleting obsolete lists or users. This typically requires behind-the scenes housekeeping that you avoid with the previous scenario. And the biggest performance penalty will be that most pages have only little content so you waste disk space and lots of time reading and writing those partially filled pages.
Scenario 3
This scenario is even worse that scenario 2. Don't do it, it's madness.
3 million tables todo_entries with a few records each
So...
Stick with option 1. It is your only real option.