i'm working on a new project and looking into a good approach/ Best practise for file storage and how to efficiently map the files to several resources in a relational data model.
Note: All files are uploaded on a filesystem (not a database)
Option 1
Creating a Files table to store meta data of each file and create a relation with each resource (e.g. user_files, product_files, ... )
Questions: Will all files be stored in a single table? If I want to fetch the user avatar I have to search in a table that also contains product images or pdf documents?
Option 2
Store the needed file meta data with each resource.
Downside: each resource can only have 1 image. (good for user avatar?)
Option 3
Files are created using a hash of the resource (e.g. user_1243_avatar)
No relation will be stored and the url will be build when the resource is fetched.
Are there other options to consider?
all input is welcome.
It's good that you store files in the filesystem. I used option 1 in a recent project and would recommend it.
If I want to fetch the user avatar I have to search in a table that also contains product images or pdf documents?
You make it sound as if you'll have to do a table scan. In my DB, a user's avatar was indicated by an avatar_file_id in the users table, so I could join from users to files directly using an index.
Your user_files sounds too generic. Users could have different files for different purposes, are you just going to lump them all together? Would you create a table person_to_person to lump together friends, marriages, manager/employee, parent/child, etc? I wouldn't.
Create different tables for different purposes, e.g. a one-to-one table user_avatar (or denormalize it into your users table). Some more examples might be product_images, product_specs, product_helpfiles, etc.
Related
I have a system whereby you can create documents. You select the document type to create and a form is displayed. Data is then added to the form, and the document can be generated. In Laravel things are done via Models. I am creating a new Model for each document but I don't think this is the best way. An example of my database :
So at the heart of it are projects. I create a new project; I can now create documents for this project. When I select project brief from a select box, a form is displayed whereby I can input :
Project roles
Project Data
Deliverables
Budget
It's three text fields and a standard input field. If I select reporting doc from the select menu, I have to input the data for this document (which is a couple of normal inputs, a couple of text fields, and a date). Although they are both documents, they expect different data (which is why I have created a Model for each document).
The problems: As seen in the diagram, I want to allow supporting documents to be uploaded alongside a document which is generated. I have a doc_upload table for this. So a document can have one or more doc_uploads.
Going back to the MVC structure, in my DocUpload model I can't say that DocUpload belongs to both ProjectBriefDoc and ProjectReportingDoc because it can only belong to one Model. So not only am I going to create a new model for every single document, I will have to create a new Upload model for each document as well. As more documents are added, I can see this becoming a nightmare to manage.
I am after a more generic Model which can handle different types of documents. My question relates to the different types of data I need to capture for each document, and how I can fit this into my design.
I have a design that can work, but I think it is a bad idea. I am looking for advice to improve this design, taking into account that each document requires different input, and each document will need to allow for file uploads.
You don't need to have a table/Model for each document type you'll create.
A more flexible approach would be to have a project_documents table, where you'll have a project_id and some data related to it, and then a doc_uploads related to the project_documents table.
This way a project can have as many documents your business will ever need and each document can have as many files as it needs.
You could try something like that:
If you still want to keep both tables, your doc_upload table in your example can have two foreign keys and two belongsTo() Laravel Model declarations without conflicts (it's not a marriage, it's an open relationship).
Or you could use Polymorphic Relations to do the same thing, but it's an anti-pattern of Database Design (because it'll not ensure data integrity on the database level).
For a good reference about Database Design, google for "Bill Karwin" and "SQL Antipatterns".
This guy has a very good Slideshare presentation and a book written about this topic - he used to be an active SO user as well.
ok.
I have a suggestion..you don't have to have such a tight coupling on the doc_upload references. You can treat this actually as a stand alone table in your model that is not pegged to a single entity.. You can still use the ORM to CRUD your way through and manage this table..
What I would do is keep the doc_upload table and use it for all up_load references for all documents no matter what table model the document resides in and have the following fields in the doc_upload table
documenttype (which can be the object name the target document object)
documentid_fk (this is now the generic key to a single row in the appropriate document type table(s)
So given a document in a given table.. (you can derive the documenttype based on the model object) and you know the id of the document itself because you just pulled it from the db context.. should be able to pull all related documents in the doc_upload table that match those two values.
You may be able to use reflection in your model to know what Entity (doc type ) you are in.. and the key is just the key.. so you should be able.
You will still have to create a new model Entity for each flavor of project document you wish to have.. but that may not be too difficult if the rate of change is small..
You should be able to write a minimum amount of code to e pull all related uploaded documents into your app..
You may use inheritance by zero-or-one relation in data model design.
IMO having an abstract entity(table) called project-document containing shared properties of all documents, will serve you.
project-brief and project-report and other types of documents will be children of project-document table, having a zero-or-one relation. primary key of project-document will be foreign key and primary key of the children.
Now having one-to-many relation between project-document and doc-upload will solve the problem.
I also suggest adding a unique constraint {project_id, doc_type} inside project-document for cardinal check (if necessary)
As other answers are sort of alluding to, you probably don't want to have a different Model for different documents, but rather a single Model for "document" with different views on it for your different processes. Laravel seems to have a good "templating" system for implementing views:
http://laravel.com/docs/5.1/blade
http://daylerees.com/codebright-blade/
Context: I recently started a new job. I found my colleagues were exchanging information (product spec sheets, 3D renderings, etc) via files and email, which creates the infuriating situation where there are multiple versions of files being passed around. I decided to start building a solution using FileMaker to resolve this, mainly because I'm not really a technical person and FileMaker seems pretty easy to understand. I have been learning both database design and FileMaker literally from scratch.
Purpose: The solution will needs to be able to do the following:
Allow central management of data and files
Export a product roadmap for sales people
Export current product catalogue for sales people
Export product spec sheets
This, in my mind, will help everyone by maintaining a single set of accurate data which can be exported in different views.
Question: What is the best way to incorporate different types of files into the database?
For some views, I would like to be able to show related files, including 3D renderings, images, SoC data sheets, user manuals, etc. What would the schema look like?
Regarding files, I have the following tables:
Files (FileID, FileFormatID, FileName, FileTypeID, FileContainer, DateCommited, DateModified, TimeModified, Comment)
FileFormats (FileFormatID, FileFormat), where FileFormat is svg, pdf, Word, png, jpg, etc...
FileTypes (FileTypeID, FileType), where FileType is 3D Rendering, Gerber, Photo, Certification, QIG, etc...
Solution generated by my feeble mind:
ProductFiles (ProductID, FileID), where ProductID is the key in a Products table.
SoC_Files (SoC_ModelNo, FileID), where SoC_ModelNo is the key in an SoC table.
This way I can include in my views a list of files related to a product or SoC, showing only the FileTypes or FileFormats I need.
However, this seems messy. Is there a better way to do this?
Thanks! It's my first question on StackOverflow, so please let me know if the question is unclear or inappropriate in any way.
EDIT: The SoCs are not products themselves, they're used in the products. Some customers want that information. Each file can belong to multiple products or SoCs, and each product or SoC can have more than one file.
I suspect we need more information about what your solution is about. If it's chiefly about documentation, then the differences between the objects being documented are most likely irrelevant.
In any case, you describe a many-to-many relationship between Files and Products - so you should have a join table between these two, where each combination of file-to-product will be stored as an individual record.
If it turns out that you do need a separate SoC table, you could turn the join table into a "star-join" table - meaning it would have fields for:
FileID
ProductID
SoCID
and in each record either the ProductID or the SoCID field would be empty.
Note that in Filemaker you have another option to establish a many-to-many relationship: you could use a checkbox in the Files table to select the products which the file documents. However, in such case, (1) you won't be able to record anything about a specific file-to-product join and (2) it will be more difficult to produce a report of files-by-product or vice versa.
The FileFormats table is redundant and can be replaced by a custom value list: file extensions are unique and unchanging, and you have nothing to record about any of them. I have a feeling the same is true about the FileTypes table.
An exception to the above: if you can have multiple versions of the same file in different formats, you may need to add another table for the physical files.
I am getting in backend of Typo3 a list of the entries from database table "Books" via SysFolder. I can make new books, edit books etc..
I have also a database table "Extrainformation" where I would like that will come the extra information about the book. In table "Extrainformation" there is a key "Book_id" as a connector between the tables.
What I am trying to get is that when I make a new record via this SysFolder I would like to get some of the fields saved in the different table.
Like when I have input fields:
Bookname
Book description
Book Publishdate
Extrafield1
Extrafield2
I would like that infos about Bookname, Book description and Book Publishdate would be saved in the table "Books" and infos about Extrafield1 and Extrafield2 would be saved in "Extrainformation" table. (And then when I edit a book it should bring the data in to the form from these two tables)
Has someone made something like this before? Is there some easy way to combine databaseinformation from multiple tables via SysFolder? When there is no "easy" way, does someone know where would it be possible to "hack" (saving data in database / getting data from database) so that it would be possible to merge the data in one form.
You are looking for "Inline relational record editing" IRRE.
BTW, there is nothing special about folders in the pagetree. Technically there is no difference to "normal" pages, except that they will not be rendered in frontend.
I'm writing a small search engine in C with curl, libxml2, and mysql. The basic plan is to grab pages with curl, parse them with libxml2, then iterate over the DOM and find all the links. Then traverse each of those, and repeat, all while updating a SQL database that maintains the relationship between URLs.
My question is: how can I best represent the relationship between URLs?.
Why not have a table of base urls (ie www.google.com/) and a table of connections, with these example columns:
starting page id (from url table)
ending page id (from url table)
the trailing directory of the urls as strings in two more columns
This will allow you to join on certain urls and pick out information you want.
Your solution seems like it would be better suited to a non relational datastore, such as a column store.
Most search engine indices aren't stored in relational databases, but stored in memory as to minimize retrieval time.
Add two fields to table - 'id' and 'parent_id'.
id - unique identifier for URL
parent_id - link between URL's
If you want to have a single entry for each URL then you should create another table that maps the relationships.
You then lookup the URL table to see if it exists. If not create it.
The relationship table would have
SourceUrlId,
UrlId
Where the SourceUrlId is the page and the UrlId is the url it points to. That way you can have multiple relationships for the same URL and you won't need to have a new entry in the Url table for every link to that url. Will also mean only 1 copy of any other info you are storing.
Why are you interested in representing pages graph?
If you want to compute the ranking, then it's better to have a more succinct and efficient representation (e.g., matricial form if you want to compute something similar to PageRank).
I would like some database/programming suggestion on a specific issue.
I have 5 different people (that live in different parts of the world) that provide me with data. This data is given to me in many variety of ways, following a standard structure layout. However it's not always harmonized, the data might have extra things that are not in the standard, so I'd like the structure to be as dynamic as possible to accommodate what the person wants to use.
These 5 data sources are then placed inside a central database I host. So basically I have 5 data sources that are formatted following a standard structure, and they are uploaded to my local database.
I want to automate the upload of this data as much as possible for the person providing the data, so I want them to upload new sets of data that are automatically inserted in my local db.
My questions are:
How should I keep the structure dynamic without having to revisit my standard layout to accommodate new fields of data, or different structure?
How do I make them upload data in a way that is incremental? For example they might be uploading an XML version of their data, my upload code should figure out what already exists.
My final and most important question. Are there better ways of going about this instead of having an upload infrastructure?
How should I keep the structure dynamic without having to revisit my standard layout to accommodate new fields of data, or different structure?
Basically, you pivot the normal database idea of columns and rows.
You have a data name table, which consists of the unique names of the fields of data, and an indicator to tell the import process what type of data is stored, like a date, timestamp, or integer.
You have a data table, which contains the data name id, a sequence number, the data field, and a foreign key to identifying information.
The sequence number is used to differentiate between different values of the same data name.
The data field holds every type of data possible. This would be a VARCHAR(MAX) in most databases. It's up to the upload process to convert dates and numbers to strings.
Your data table will have a foreign key to the rest of the information that identifies who the data field belongs to.
How do I make them upload data in a way that is incremental? For example they might be uploading an XML version of their data, my upload code should figure out what already exists.
The short answer is that you can't.
Your upload process has to identify duplicate data and not store it on the database.
My final and most important question. Are there better ways of going about this instead of having an upload infrastructure?
This is a hard question to answer without knowing more about the type of data you're receiving, but there is software that allows you to load databases without a lot of programming, by defining the input data structure and mapping that structure to your database tables.
This is a very general question, but I think I have a general answer. What I think solves your problem is to construct a new relational calculus where the properties attached to the master record are not pre-determined. Here is an example involving a phone book application.
Common method using a non-relational table:
Table PERSON has columns Name,
HomePhone, OfficePhone.
All well and good, but what do you do if the occasional person shows up with a mobile phone, more than one mobile phone, a fax phone, etc.
Instead what you do is:
Table Person has columns Person_ID,
Name.
Table Phones has columns Person_ID,
Phone_Type, PhoneNumber.
There is a one-to-many relationship between Person and Phones, and there can be any number of them from zero to a zillion. The tables are JOINed by Person_ID. You have to have business and presentation logic that enumerates the Phone_Type column (or just let it be free-form, which is not as useful but easier).
You can do that for any property, and is what relational data bases are all about. I hope this helps.
As others have said, EAV tables can handle dynamic structure. (be aware of performance issues on large tables)
But is it in your interest to have your database fields dictated by the client? You can't write business logic to act upon those new fields because they don't exist yet, they could be anything.
Can you force the client to conform to your model? This allows you to know the fields ahead of time and have business logic act upon the fields. It allows you to write meaningful reports as well, rather than just pivoted data dumps.