Best practice for storing images in a database? - database

I'm creating a web application that needs to store various images. A good example would be an event site:
There are events that have Flyers + Photos (belonging to an event) and Locations that have also photos (belonging to a location).
But I have some troubles creating an ideal solution for all this, because I want to be very flexible. So Photos should be available in multiple resolutions, Flyers could have a back side and thumbnails as well.
Currently I just store the path to the original image in the database and "calculate" the other paths with adding "_small" or "_800x600" to the file name. Although with this approach I have to clean up all the thumbnails when changing a flyer, because I don't keep track of the thumbnails.
So what is the best practice for this? Should I store the paths for the thumbnails too? Are there any famous examples on how to do this? It seems like a rather common problem.
Thanks in advance.

Related

Approach for building a Gallery of images (ParentalKey vs StreamField)

I'm trying to decide between using ParentalKey or StreamField, in this case, with the purpose of adding an image gallery to a page.
We don't need any other information than the image itself (given that the image will be anyway a wagtailimages.Image model's instance, so the alt text is already handled there).
Any thoughts on what is better to make it easier for the editor to work even if maintaining around 50 images per page?
And about good practices and code maintainability?
Would you prefer a third party package for the image gallery? (even if that blocks you from upgrading to wagtail 4?)
Would your opinion change if instead of a single image, we needed some more fields?
Many thanks!
For an image gallery, the first recommendation would be to use the Collections feature. You can get pretty far with the nested collection system and even add extra meta data if needed by adding a model that relates to the collection.
If that is not suitable, ParentalKey/InlinePanel would be my next pick. For simple relationships you get all the benefits of StreamField such as re-ordering, add/remove items but with solid database integrity and usage stats working out of the box.
Only go to StreamField if you need to have optional data set against each image. For example if you have an image list but images could be an Image with a RichText OR just an image.
Unfortunately, managing large sets of images is not great (outside of collections) so you may find you need to build a seperate UI for this. If that ends up being the case you will find migration of data already in model relations being easier to do or maybe not even needed with something like ModelAdmin.
Hope it goes well, be sure to write a blog post about what you end up doing.
I would use the ParentalKey with InlinePanel for that. It shows you all the images as a list in a more compact way than the StreamField. One can reorder this list.
A StreamField is more expandable in the future. You could add new blocks, like videos or quotes or whatever at any point. If you define each block as StructBlock, you will be able to add whatever you want in the future to these blocks without loosing existing data (also true for the ParentalKey model).
I would not use Collections for image slideshows as you won’t be able to sort the imagas in a collection via the CMS, right? Collections are meant to keep order in the backend, I think.

How can I create an add folder functionality to my React CMS application?

Click here to see a picture of what I mean
I haven't tried anything yet because I'm not sure how to even approach this problem. I'm not even sure what to Google. I do, however, have a pretty good handle on React. Thanks!
Update: The folders will not be storing files, just hyperlinks.
You need to model the problem space first. i.e. models for folders, and files. Each having properties (name, etc.) and associations (folders can have many files and subfolders).
To store the physical files you can use a third-party service like Amazon S3.
This would get you started at least.

How save text, svg, html, css all together efficiently

In an application, I am using Fabric.js, which lets users write text, draw SVG's, insert images etc.
I want to know, what is the best way to store this data.
Requirements are:
Ability to query the data(text), which tells me that i should store it in DB (MySQL as of now)
I have images, and I am targeting IPad as well, so the images are important, as to how they are stored.
SVG's and HTML/CSS to be saved as well.
I also want to do versioning of the content, as Quora does it, so that a user can see the changes from the past version to the current version. This also includes the versioning of images and SVG's.
I am wondering how Google Docs does it, because they also store our documents, drawings etc.
What is the best way of doing this?
i dont known if it helps but, Opera browser offer an option to save the webpages to an unique file { mht extension }, this stores all the files { css, images, scripts, etc } in base64 encoded text for a later use { when the document is opened }... maybe this can be a way to store data :P
I manage a webapp where users generate reports, and found it more efficient to store images and binary files in the filesystem, and link to them from the database. Elements that are in xml or text are kept in the database for easier searching - in your case this would include css/html and svg (which is xml). Use the database for managing revisions.
Might also check out this thread on storing images in a database.
It looks like Frabic.js is using the node.js javascript webserver on the backend - haven't used this before, but you might investigate which databases are easiest to use with node.js:
node.js database
nodejs and database communication - how?
nodejs where to start?
If you want to query the text efficiently, then perhaps putting all bits of information into the DB separately is the most efficient. Maybe you with to play with OOXML or ODF, that may serve as container for all information you require, and then XML-storage (e.g. eXist) to store it and query (e.g. the text). As these standards are XML-based, you can transform them into HTML (e.g. here or here) but writing an online editor for this is something that monster like Google can do.
You can take a look at NoSQL databases like MongoDB or
CouchDB
See also Storing images in NoSQL stores

How to scrape logos from websites?

First off, this is not a question about how to scrape websites. I am fully aware of the tools available to me to scrape (css_parser, nokogiri, etc. I'm using Ruby to do the scraping).
This is more of an overarching question on the best possible solution to scrape the logo of a website starting with nothing but a website address.
The two solutions I've begun to create are these:
Use Google AJAX APIs to do an image search that is scoped to the site in question, with the query "logo", and grab the first result. This gets the logo, I'd say, about 30% of the time.
The problem with the above is that Google doesn't really seem to care about CSS image replaced logos (ie. H1 text that is image replaced with the logo). The solution I've tentatively come up with is to pull down all CSS files, scan for url() declarations, and then look for the words header or logo in the file names.
Solution two is problematic because of the many idiosyncrasies of all the people who write CSS for websites. They use Header instead of logo in the file name. Sometimes the file name is random, saying nothing about a logo. Other times, it's just the wrong image.
I realize I might be able to do something with some sort of machine learning, but I'm on a bit of a deadline for a client and need something fairly capable soon.
So with all that said, if anyone has any "out of the box" thinking on this one, I'd love to hear it. If I can create a solution that works well enough, I plan on open-sourcing the library for any other interested parties :)
Thanks!
Check this API by Clearbit. It's super simple to use:
Just send a query to:
https://logo.clearbit.com/[enter-domain-here]
For example:
https://logo.clearbit.com/www.stackoverflow.com
and get back the logo image!
More about it here
I had to find logos for ~10K websites for a previous project and tried the same technique you mentioned of extracting the image with "logo" in the URL. My variation was I loaded each webpage in webkit so that all images were loaded from CSS or JavaScript. This technique gave me logos for ~40% of websites.
Then I considered creating an app like Nick suggested to manually select the logo for the remaining websites, however I realized it was more cost effective to just give these to someone cheap (who I found via Elance) to do the work manually.
So I suggest don't bother solving this properly with a fully technical solution - outsource the manual labour.
Creating an application will definetely help you, but I believe in the end there will some manual work involved. Here's what I would do.
Have your application store in a database a link to all images on a website that are larger than a specified dimension so that you can weed out small icons.
Then you can setup a form to access these results. You may want to setup the database table to store the website url and relationship between the url and image links.
Even if it we're possible to write an application to truly figure out if it was a logo or not seems like it would be a massive amount of code. In the end, it would probably weed out even more than the above, but you have to take into account it could be faster for human to visually parse the results then the time it took for you to write and test the complex code.
Yet another simple way to solve this problem is to get all leaf nodes and get the first
<a><img src="http://example.com/a/file.png" /></a>
you can lookup for projects to get html leaf nodes on the net or use regular expressions to get all html tags.
I used C# console app with HtmlAgilityPack nuget package to scrape logos from over 600+ sites.
Algorithm is that you get all images that have "logo" in url.
The challenges you will face with during such extraction are:
Relative images
Base url is CDN HTTP/HTTPS (if you don't know
protocol before you make a request)
Images have ? or & with query
string at the end
With that things in mind I got approximately 70% of success but some images were not actual logos.

Is it possible to search the database for related files when a file is loaded in a upload form

I have an idea for a site that involves uploading files to the site. But what I'd like - and wondering if it's possible - is when a user clicks on "Browse", and selects the file, if it's possible for the site to automatically scan the site's database for similar files before they upload the file to the site. Kind of similar to the automatic "Related Questions" when you act a question on this site.
Sure, that's possible. But you'll have to come up with your own definition, as well as algorithm for finding what's similar.
File Type differences
Different file types should be compared differently. For example a text file would be well suited to a diff to find similar files, but comparing images or videos that are similar is considerably more difficult.
Difficulty of comparisons
Also, comparing against a large number of files is a very expensive thing to do since it's typically done pair-wise. Some indexing methods could help the efficiency of the search though, but I don't see an easy way to do this quickly.
Crowd Source Alternative
Another alternative would be to have the users of the site point out the similarities, that way you simply display a list of the most popular files that were voted similar. Of course, this doesn't help when uploading a new file, but it can help you gain insight as to what users find similar.
What many sites do to compare similarity of content is to allow users to tag items. If one item shares many of the same tags with another, they're likely similar. This is probably the easiest approach.
This also has the benefit that any content type can be compared to any other content type. So text files that have the same tags as a video can be presented as similar.
It's possible to get the file name without uploading the file so you can do the search based on the file name. The content would only be available after the upload.

Resources