How to make a content moderation system? - database

I have implemented a chat which filter bad words. I have a NoSQL database where I store all the users and chats, and basically my idea is to process all chats, filtering the swear words, and if found, pass this chat to the content moderators.
I have never done that, that is why I am asking, to get an idea of ​​how this process is carried out in real life.
I have thought of creating a separate collection in the NoSQL database that only moderators have access to, and thus, put the bad chats as documents in it. I suppose this will be so, I don't know (maybe is not
right to consume space in the app db). If yes, how do the professionals share these chats with the moderators? Do they give them direct access to the database or through an user interface?
The question may seem a bit absurd, or based on personal opinions, but it really is a question that I have. In other words, my intention here is to get an idea of ​​how to achieve this process following good practices.
Thanks.

Related

Mongo DB: Single collection per user whit all interaction, or Multiple Collections per argument?

Good Evening.
I'm pretty new to mongo db and i'm planning to make an app who will work whit Nosql(MongoDB).
The scope of the app is pretty simple:
Register a profile
Request item to a shopper
Fulfill and sent payment notice.
If i would make this whit SQL i would create a User Table, A Request Item Table a sending Paymen Table.
I would, also in order to learn something, to make it whit NOsql, and i choose mongo.
I could create 3 collection and put every different document and make a search every time i need.
OR, and this is the question, COULD i create collection for EVERY user, and inside every user put every interaction of the very same user.
So if i need to search for User10 order and paymen, i would look only inside User10 collection and search for every item he\she requested.
But on the other hand, how much can affect me if i need to search all order in a specific timeframe? It should be slower than SQL i suppose.
Is a acceptable way to do this, there are some backdraw i did not yet seen, or is discouraged in order to make another approach?
The backend would be write in Java, meanwile the app (for...reason) would be write in Xamarin.Form
.
While this is possible I would personally recommend against this as this is considered an anti pattern, you should read this article about this very topic.
I would personally ask myself what are the advantages of this approach that i'm hoping to gain? if quick queries at a user level is what you seek this should not be a problem with sufficient indexes. (on user_id and on timeframe ).
There are other standard solutions built to deal with scale like collection sharding. From my personal experience MongoDB deals with scale very well, It sounds like this is a personal project to learn from which probably means you'll never really reach hyper scale, The first barrier you'll probably encounter is hardware.

User or User Profile model for app wide relationships

I recently read a tweet that suggested that if one wants to avoid headaches in the future of an app, they should have the user table have only authentication information and a user profile table for everything else. That is if you have bikes and peaches in the system they should be linked to the user that owns them via the user profile id. The tweet was not clear on what the consequences of using the user profile. Are there maintainability/scalability repercussions to not following this especially in a large web app?
Well, don't take it as a dogma, though it isn't completely worthless. Dependency is a problem: if you have to have a lot of different data that represent particular user, you'll change underlying database oftenly. In case everything is stored in a single column, you might find yourself doing repetative monkey job of "making it work" with your types/ORM and whatsnot gonna be involved in DB <-> RUNTIME communication.
It is all about splitting complicated task into smaller less complex subtasks: auth is self-standing - one of the most important - task itself and it definitely deserves some dedicated space. However, your app might be not that big, or not that concerned with users, and thus it won't be very helpful to split data into multiple columns. You must develop a deep sense of purpose and measure when it comes down to a software design.

Strong consistency in Datastore (HRD)... my idea

I'm hoping that this isn't flagged as "not helpful" because I think that many people are attempting to figure out a way to keep strong consistency in the HRD.
Here is the idea I'm using for my app. I'd like to get your opinions.
I have a fitness app. This is of course made up of Workouts and Exercises.
The HRD contains about 400 exercises to pick from, or the User can create their own Exercise (a UExercise).
When the User logs in, I load all of the Workout keys into a "workoutKeys" List on the User. At the same time I load all the User exercise keys (UExercise) into a "exerciseKeys" List also on the User.
If the user wants to add/delete exercises from a specific workout, the Workout is loaded and all its Exercise keys are loaded into a "exerciseKeys" List on the Workout.
See a pattern here?
So whenever I want to view Exercises created by the user (UExercise) or the users Workouts, or the Exercises in that Workout, I do a get() using those keys.
Since a user would probably not have 1000's of Workouts, or create 1000's of Exercises, I think this is a safe and quick way to achieve strong consistency.
Not saying that this is the best way for EVERY app. But for mine I believe it will work well.
I would greatly appreciate all of your input as to if there is something I may be missing here, or not properly taking into consideration.
Ok... After some careful consideration of how my app will work, and how users actually use it, I have decided to ditch the idea above and go with Ancestor Queries.
So for the above models, I have come up with the following...
For a Workout, I make the User the parent
For an Exercise created a user (UExercise), I make the User the
parent
This allows me to use Ancestor Queries (which are strongly consistent) to pull the most recently added or modified Entities.
Due to the fact that the user will not be modifying these Entities en mass, I think the limitations on the writes will not be a factor.
This also rids me of properties on Model objects that should not really be there in the first place.
By the way, I also tried Memcache. I found this to be the ultimate pain. Having to keep the Memcache and the Datastore in sync seemed to inject much more complexity than was really needed.
But your site, and results may differ. This idea works well for my app.
Thanks!

Questions and considerations to ask client for designing a database

so as title says, I would like to hear your advices what are the most important questions to consider and ask end-users before designing database for their application. We are to make database-oriented app, with special attenion to pay on db security (access control, encryption, integrity, backups)... Database will also keep some personal information about people, which is considered sensitive by law regulations, so security must be good.
I worked on school projects with databases, but this is first time working "in real world", where this db security has real implications.
So I found some advices and questions to ask on internet, but here I always get best ones. All help appreciated!
Thank you!
Some other specifics besides what has already been said:
Do you have any Regulatory
requirements for data access and
storage (Sarbanes-Oxley and HIPAA
come to mind)
Do you need to be able to audit
record changes
What internal controls do you need
reflected in the database
What business rules must be followed
under what circumstances
How large to you expect the data to
get - the larger the data store
expected the more critical to design
with performance in mind from the
start
How flexible do you want the system
to be (do you want to be able to add
columns on the fly? OR add business
rules) Be careful with this one, make
sure the client understands that
flexibilty often comes at the cost of
performance.
Do you need a separate data warehouse
for reporting?
How do you need the data populated?
Will it come from an application,
multiple applications, data imports
or a combination?
What databases do you currently have
license for? Do you want to have
this application use it?
Will different groups of users need
different accesses?
How is the process currently being
handled, can we have access to that
database or see the current process
in action. Observe, for a minimum of
one day, the client using the current
system. Take extensive notes, you will learn many things no one will think to tell you.
Do you need to migrate data from the
old system
i would start with:
Please explain your business to me.
Which processes are you looking to
automate or improve?
Do you have any reports you need to
generate?
Do you need inputs to any other
systems?
use cases (google for that, it does not need to be drawings, text is fine)
inputs
outputs
static data
historical data
From there you derive the info you need to store, you apply 4th NF, and go !
Good luck ! 8-))

designing database

Basically my job is to develop web applications using a database as backend. What I have been doing till now is,
Basded on the requirement of the client,
I draw a basic sketch of what the
tables are ,how they look like
fields in those tables and some one-to-one or many-to-one or many-to-many relations
Although I am not perfect at these things, I try to figure out how the relations should be from my past projects that I worked on. But there are still some doubts about this in my mind.
If the client asks that he wants a particular data, I try to achieve it either through a direct SQL query or thought the scritp (in most cases PHP), if I am unable to figure out a query at all for that particular request.
Now, here comes my question.
Based on the relationships that I
figured out while developing tables,
are there any limitations to what a
client can ask? What I mean to say by
this is, the client will ask that he
wants list all the indidual
products, their counts, associated
category, all the counts of
category, the products in each
category and the their prices, sum of
all the category prices and the
total prices so on so forth.
This is just an example of one request to explain my situation.
Now, if there is any request that can potentially take longer time for the exection, can the developer satisfy this request by breaking down the request?
Do I need to tell him why is this break down necessary?
What if he feels that I am not capable of doing it in a single shot?
Is every report that he asks for need to be in single query? or will there be any need to itake the help of PHP to proces one loop and based on the values that I get, I put some conditions to apply rules that the client wants?
What is the better way to do this kind of job?
Any views?
Thanks.
This will generally depend on the Database used.
Most queries could be done in a single select, but this shoudl never stop you from looking at Views/Sub Selects/ Stored Procedures.
You should be able to handle most of your queries in this fashion, so I would recomend:
Dont let the output determine how you design the database, this might lead you down the wrong road. You need to stored data in the most normalized fasion suitable to the application.
Lots of questions!
Based on the relationships that I
figured out while developing tables,
are there any limitations to what a
client can ask?
A client can ask for anything really. Clients aren't always rationale. It's part of your job to help the client think through their needs.
What I mean to say by this is, the
client will ask that he wants list all
the indidual products, their counts,
associated category, all the counts of
category, the products in each
category and the their prices, sum of
all the category prices and the total
prices so on so forth.
All of these queries sound possible with SQL. To list individual products use the SELECT statement. To get a count use COUNT. To get associated categories use JOINS. Use SUM to get total prices.
Now, if there is any request that can
potentially take longer time for the
exection, can the developer satisfy
this request by breaking down the
request? Do I need to tell him why is
this break down necessary?
Yes - breaking down the request can help a client understand their needs.
What if he
feels that I am not capable of doing
it in a single shot?
Convince him otherwise. You don't want him thinking you're stupid if you want to keep his business. :)
Is every report that he asks for need
to be in single query? or will there
be any need to itake the help of PHP
to proces one loop and based on the
values that I get, I put some
conditions to apply rules that the
client wants?
Really depends on your skill level. If you know SQL well enough you can get most of your data in one query. If you aren't as good then you might do a few queries and then loop of them in php. Typically it is faster to do it all in SQL.
What is the better way to do this kind of job?
Are you working for yourself? If so, sometimes it just takes experience to figure out the best way. (and posting to stackoverflow :)
You should look at the general principles of requirement specification and represent your client's needs, for example as user stories, which are tasks that a user will wish to perform. You can then cost each user story as a unit of work. It should be possible to work on one user story at a time, so you can agree on the order in which you will deliver them.
It's best to look at each story / query as separate. This way you can add or remove functionality from the schedule depending on the needs of the client. If you spot common patterns, you can refactor them as you go along.
Many problems come from people trying to over-optimise or over-generalise. I would write each query separately unless you find that they are starting to overlap.
The people paying the bills can ask for anything they want. If what they are asking for really doesn't make sense, then try to make them see reason.
A business requirement shouldn't be changed or removed just because it might be 'hard' to implement.
Design your Database schema to reflect the domain model and normalise to at least 3NF.
Generally, aggregate queries (such as those commonly used to drive reports) can be implemented to utilise indexes and RDBMS specific features to reduce their running time.
It sounds like you just need to improve your data design skills. A properly designed/ normalized database, as astander suggests, won't run into the issues you're worried about. But it takes a lot of time to learn the Right Way to design a database if you just keep learning from mistakes. When I was starting out as a web dev, I found Database Design for Mere Mortals a huge help in showing you how to avoid painting yourself into corners. There's a companion book about how to write good queries on your databases as well. The two books won't teach you everything there is to know, but they give you a great foundation.
It sounds like you are lacking a bit in your confidence. These are the kinds of problems developers face every day. I think it's good that you recognize a possible weakness and are taking steps to improve. Take some time to learn more about databases and queries.
That said let me answer some of your questions directly:
Now, if there is any request that can
potentially take longer time for the
exection, can the developer satisfy
this request by breaking down the
request?
Yes you can break down the request.
Not every request can be satisfied in
a single query.
Do I need to tell him why is this
break down necessary?
Only if he asks. As long as you meet the requirement you should be fine. If he knew how to do it a better way then why did he hire you?
What if he feels that I am not capable
of doing it in a single shot?
Once again, if he knew a better way
to code the database and reports then
why did he hire you?
Is every report that he asks for need
to be in single query? or will there
be any need to itake the help of PHP
to proces one loop and based on the
values that I get, I put some
conditions to apply rules that the
client wants?
No, not everything can be done in a
single query, it depends on the
complexity of the report.
The design of you relational database will have a huge impact on what can be done (in an effective way) or not. I tend to say it is the most crucial part of your application.
Your process for drawing you tables design is ok. but after that you should review your different Uses Cases and see (with just a pencil and paper) if your database design is able to cope with each of them.
You could then discuss with your client to make sure some cases will never happen. ("e.g: Can you confirm that a Product will always belong to 1 and only 1 Category").
That said, the client can, indeed, ask anything in the specs. You are free to accept, refuse, or explain to him why his specs are unrealistic. If you develop a for a fixed price without clear specs, you're in a bad situation, and it's your fault...

Resources