PDF content saving to database with navigation preserved - database

Gooday all.
I am currently doing some research on the best ways you save pdf content in a database (as HTML content), with the ability to navigate using the table of content later on the front end when rendered as HTML.
The table of content could also have nested content. I am currently looking at saving the table of content as JSON data in the database, and pdf pages as separate columns in another table, but that wouldn't look like a good option as updating later wouldn't be straightforward.
Any pointer to resources and a proposed solution would be appreciated.

Related

How do I save rich text editor data to DRF - postgresql database and display in React

I want to develop a blog application with Django backend and React frontend. I shall be using Postgresql.
I want to use a rich text editor like Quill to write the blog article. My questions:
I heard that article written in a text editor needs to be converted to HTML before saving in the database. If so, how do I do this in Django Rest Framework?
How do I present the article keeping the same style and formatting in the frontend from the database?
Say, I include multiple photos in the article. How do I save all the photos in the database? i.e. what should be the schema then?
I want to have my doubts clear before I jump in.
I'm also doing the same thing at the moment. For your answers :
In DRF, the simplest way to post the data is by using Textfield in your model. Rich text field (with Tags) will be stored in the Postgres. In the Admin page or the DRF API you'll see something like this
Then, to re-render it to the front end, you can use any HTML Parser library. for example I'm using "react-html-parser" that simply convert the rich text into the defined styling.
As for Image, this is a bit tricky, and I havent done this part myself but what i could think of right now is you would create another model & end points to store the images.
when sending the post request to the django, you would convert the base file path/url from the front end to back end. example :
original > http://localhost:3000/image/efewf23r.jpg
new (django) > http://localhost:8000/media/img/img_model/efewf23r.jpg
then do a second post request to the image itself and make sure django would rename the the file as per what we set above.
Let me know if you found a better solution.
It's been long since I posted this question. After that, I have gained enough working knowledge to make Quill.js (the rich text editor I'm using) work with React.JS, or, in my case Next.JS. So, this is focused on Quill.js only. The Quill npm package more specific to ReactJS is react-quill. I am presenting it as beginner friendly as possible.
A built-in function is provided with Quill: editor.getHTML(). editor is the current editor instance, where one types the content. This method generates the innerHtml of the content one prepares in the editor.
To save it to the databse, simply POST it to your back-end. But you must sanitize this innerHtml before passing it to the database. Can't say about server-side but I had to do this sanitization on the client-side. One good package is DOMPurify. You need to save this to the database if you want to present it in the same manner as it was typed in the browser.
The first point also provides the solution to my 2nd question. But one important point: The content one writes in Quill editor is also available as a JSON like format called quill-delta. You can get the delta with the function editor.getContents(). You need to POST it to the database if you want to edit the content in a later time.
To edit, you need to get this delta from the database and then initialize Quill editor with this delta in the value attribute.
For example, the text in orange is the delta representation of the text in the editor:
codepen source.
There is another function editor.getText() which extracts all the text from the editor.
Photos. Generally in Quill, you simply put the photo in the editor and Quill generates a base64 encoded delta for the photo. It's this easy. You don't need to worry about separate image fields.

Conflicts between JSON-LD structured data objects?

Our site is an online clinic Management System that is developed with AngularJs.
I put structured data objects in JSON-LD format and just because our site is single page, I put general json-ld (#type: Organization) in layout, and dynamically add breadcrumb (#type: BreadcrumbList) and webpage (#type: WebPage) or article (#type: NewsArticle) data for each page.
Google Structured Data Testing Tool detect all my objects, and after a while They appeared in my Google Search Console.
The problem is, after I replaced the (#type: Organization) json-ld in layout, with another json-ld (#type: WebApplication), google testing tools still show them, but all my other detected Structured Data in search console gone!
You can check my data live here.
Is there any conflicts between structured data objects?
For example: we cannot put any other structured data beside (#type: WebApplication)!

Joomla database query based on dropdown selection

I am very new to programing but after a lot of research i can rufly manage to create a dropdown selection from database and show a column result from that table based on the dropdown selection using ajax.
I write the html form, the php file that connects to the database, the script that loads the data on the dropdown list, the script that handles the on change event and calls for the php file that queries the database and display the result on a div.
In this particular case is a price stored on the database based on the destination selection.
Now my problem atarts here:
I'm using joomla and i would like to do this with the code that I have but I have no clue where to save the various html, php and js files.
I tried doing this with chronoforms but gave up as I found it very difficult.
Can someone give me some advise on what and how to do this?

Database search field database

I'm looking for a way to create a search box in wordpress, where visitors can search a number from the database. Is this possible? I have several package numbers in my database. I want to give my visitors the ability to search for their package number and request the information that comes with the number.
What you want to do can be done.
I suggest a different approach than using wp-exec. (I just looked at wp-exec website, and that plugin was created for WordPress 1.5, which means it hasn't been updated in about 5 years).
The content you want to display exists entirely outside of WordPress. I suggest you use a custom page template - see
http://codex.wordpress.org/Pages#Creating_Your_Own_Page_Templates
In this case you would not use WordPress posts or pages or custom post types. On the custom page template you would write (or have written if you don't have the knowhow to do it yourself) PHP code to extract the info from the database and display it on a page.
For pages like that you would be using WordPress only as a container within which to display the results - they custom page would appear in the site Nav, The page of results would use the site's theme to display so it looks like the rest of the site.
But the code to display from the database would not use the WordPress loop. It would be PHP / MySQL data retrieval and display code.
I really doubt you will find a plugin that lets you display results from an external database, formatted the way you want them to appear. The reason is every external database is different, has different tables and table structures. And no two sites will want the external data visually displayed in the same way. So there is little generalization to encapsulate in a plugin as everyone wants it different.
I've created pages on some sites along the lines of what you want to do thus I know it can be done. But it requires writing custom code.

arachnode.net webpage table is huge

I have used arachnode.net crawler to crawl a website. The resulting crawl data has resulted in a database at the size of +100 gb!!!
I have looked around at the arachnode.net database and found the table "webpages" to be the culprit. When I crawl a website I do not download, images, media or anything a like, I only download the html code. However in this case I can see that the html webpages contains huge about of hidden viewdata and javascript.
So I need to do the crawling once again and this time strip out the hidden viewdata and javascript code before saving to the webpages table.
Anyone have some idea on how to achieve it.
Thanks.
Yes, you can write a plugin which modifies the CrawlRequest.Data and CrawlRequest.DecodedHtml before the data is inserted into the database.
Create a PostRequest CrawlAction as shown here: http://arachnode.net/Content/CreatingPlugins.aspx

Resources