EPiServer find and consolidate duplicated Blocks - episerver

We have had a third party perform a content migration for us into EPiServer. Unfortunately the third party didn't quite grasp the concept of Blocks and instead of reusing blocks they have recreated the same block on many pages. I've been searching around to find a way of locating duplicate blocks that contain the exact same text and consolidating them into one shared block.
I'm wondering if anyone has done anything like this before? Will this have to be a manual task of going through all pages and looking for duplicated blocks?
I've been looking at the EPiServer database and tblContentProperty seems to hold the content I could search for but don't know how to isolate blocks specifically. Obviously if I could isolate these blocks i'd still need a way of updating the pages to use a single shared one.
Any help or advice would be greatly appreciated.
Kind regards,
Daniel

Related

How to create scalable graphql queries/mutations/fragments structure?

This question might should't be here, but I am struggle a little bit, how to structure graphql queries/mutations/fragments.(Using ReactJs/Apollo client) I have read many articles and at first tried to create queries near the component (Mostly near page root file) which at first looked ok, but then when you are adding more and more pages which contains some similar components and so on, it getting hard to keep track all the fields everywhere.
As a second option I tried to centralise all queries in one folder, started creating shared fragments, which seems a bit better approach, since you can only change fields only couple of places and it is more or less done. But this approach also getting complex time to time.
How to structure queries/mutations/fragments in a scaleable way? Any suggestion or any articles will be helpful.
Second, a bit more architectural question: Where do you transform data for view, is it better to normalize using apollo TypePolicy or in an higher order component. or sth else maybe ?
I know those are a bit broad questions and depends on use cases (What kind of app you are building and so on), but lets say you are building project, which needs many features to added on the go and it is not some simple presentational website.
Thank you, whoever suggests anything.

One Google Docs workbook as a database for another using a script?

Disclaimer: I started working with spreadsheets in depth this week, prior to that it was basic usage. I've read the rules and this does relate to programming, it's just my ignorance of programming keeps me from asking a specific question. I'm new to this, I want to learn, I have to start somewhere.
I want to create two separate spreadsheet documents, one as a database for another. I want one to be able to query the other in a way similar to the VLOOKUP() function or something along those lines.
These are very large files hence the need for separate documents.
I am learning about scripting and think there might be a way there. If that's the case please appreciate that I literally started reading about scripts this morning and know nothing (yet) about them.
All I need to know is, if it's possible and what functions to use, I'll figure out how to use them. I just don't have a working knowledge of all the script functions, and a limited knowledge of spreadsheet functions.
The IMPORTRANGE() function is limited to 50 per spreadsheet, given how I want to use it, that is not enough. Unless you know a work around. That and I only want one cell of information at a time and it doesn't need to be displayed, just usable.
Also, efficiency is king since I'm working with such large amounts of data. I used to have almost 1500 VLOOKUP functions as I was building what I already have and that sucker was starting to bog down. Then I realized I didn't need a dynamic database for that aspect of the sheet. I killed about two thirds of them and it runs much better. I'd like to keep it that way, or at least try.
Finally I may have bitten off more than I can chew, but this has been a fun challenge for me, and I've met with success so far. Please don't dismiss me out of hand because I don't know the right questions to ask, or I'm trying to fit a square peg in a round hole, everyone has to start somewhere.
Thanks!
This is totally possible, though you will quickly find that spreadsheet functions are too cumbersome for this sort of operation.
With Google Apps Scripts you can query and write to and from multiple workbooks with ease. You would be working in Javascript, using javascript objects and arrays.
Start by reading the Google documentation and checking out their examples.

Trouble understanding block device API- any good link?

I really don't know if I'm just not looking in the right places, but I seriously haven't found any clear source on how filesystems interact with the underlying block devices. I understand that there are several layers involved in the process, and if I understand well, it's the request method in the device driver that ultimately handles data retrieval/writes. I just don't know how to add these requests to the queue. I do know about the block_device_operations struct, but none of those methods seem to be what I am looking for.
Am I missing anything obvious? I really don't mean to bother anyone, it's just that I cannot find this and I've got pretty much the rest of my (basic) filesystem covered (I've encapsulated all reads/writes in some custom functions which remain unimplemented).
Edit: updated question to make it more specific in here. I've got a better idea of what I need, but there's still plenty of which I don't fully grasp.

Creating and using databases

So the solid consensus I got from the answers to this question: Editing a single line in a large text file
was that instead of using a text file I should create a database and store my data there. While I think this is a great idea, I don't know the first thing about databases, the programming languages used for databases, or how to use a database once I have set it up. Could you guys give me a shove in the right direction and point me an absolute noob tutorial that might help me with this?
UPDATE: Hey guys, so I was looking at mySQL and there are a whole bunch of versions! The Cluster CGE looks like the best one, and it says it is good for "real-time open source transactional database designed for fast, always-on access to data under high throughput conditions" which just about hits the nail on the head of what I need. It says commercial next to it though, so I don't know if I would have to pay some god awful fee for it. I tried it anyway, and it said I should have gotten a license already, and until I did I could only use it for 30 days. Im confused...
Can I get this version for free? If so, where do I get the license?
Is this version way overpowered for what I need? I need:
1. A storage medium through which I can store large amounts of data
2. Read and write from in real time with simultaneous access
3. Have two different "keys" (I think I'm using that right, I need to be able to search for entrees based on one of two criteria).
MySQL is a great choice, given your Python flair.
http://dev.mysql.com/tech-resources/articles/mysql_intro.html
Good luck!

automatic documents tagging related

I started working on a project in which i must tag documents with keywords, and it is really hard and time consuming if you do it manually (specially if you have thousands of documents). So I am planning to automatize the process (knowing that the result would not perfect but at least it gives you some suggested tags ).
In the latest firefox version they implemented a system like this (when you bookmark a page, it suggests you some tags).
yahoo term extraction service is also a great example
So if any body can help me get around this problem I would really appreciate the help. Or if someone know about the firefox tagging system a little bit of help would be great.
Would a statistical algorithm work? Something Bayesian perhaps? I know they're used in spam filtering, maybe you can adapt a Bayes filter to suit your needs.
At the very least, you could suggest words that are used frequently but are not common words in English (he, she, I, and, it, then, or, etc...)

Resources