Different choices for inputting text into database

Different choices for inputting text into database - database

I am making a database of images. Each image has a certain amount of Chinese text in it. When I have made the database, I want to be able to search the text and find the image it is attached to, and vice-versa.
In each image, the largest and highest-up-the-page text is the "main text", and the rest are "secondary text". In a few images, the amount of discrete secondary texts is high (>10 pieces). The images' text varies quite a lot, so the chances that one image will have the same text as another are small. See an example image (below) where there is one main text (某词), and three secondary texts (战略, 微笑, 容易).
My question is: Should I have two columns in my "images" table, where the first column is "main text" and the other column is "secondary text", containing all other text in the image? Or should I have multiple columns allowing for X number of discrete secondary texts? I have so far assumed that it would be relatively meaningless to create a "text" table and connect it with the images table via an associative table, since the majority of the texts are unique.
A problem: Because the images I am looking at largely do not use punctuation, and Chinese characters are not written with spaces between them, having one column for all "secondary text" could very easily lead to query confusions. For example, if I take all three pieces secondary text in the example image (战略, 微笑 and 容易) and concatenate them, it becomes 战略微笑容易. But now, this has become a text containing potentially five words: 战略, 略微, 微笑, 笑容, 容易. This would create chaos in my queries.

The cleanest way would probably be to create a separate table for (at least secondary) texts. If they are largely unique you can do this without an association table.
Table images:
ID | image | main text
---+-------+----------
1 | <img> | 某词
Table secondary:
ID | image_id | secondary text
---+----------+---------------
1 | 1 | 战略
2 | 1 | 微笑
3 | 1 | 容易
If it fits your use case better, you could also put all your texts along with their type into the second table:
Table images:
ID | image
---+-------
1 | <img>
Table text:
ID | image_id | text | type
---+----------+------+----------
1 | 1 | 战略 | 2
2 | 1 | 微笑 | 2
3 | 1 | 容易 | 2
4 | 1 | 某词 | 1

Related

Natural key when combining two source tables that use similar promary keys

I have two source tables, one is basically an invoice, the other is a migrated invoice. The same object should probably have been used for both, but I have this instead. They contain most of the same data.
I had thought to combine both into a dimension table, however both will use the same natural keys. How should I approach this?
One potential solution I thought of was using negative numbers for the migrated table, but then the natural keys won't align exactly with the source.
Do I just combine them in the fact table? Then I can't link back to the dimension table for either due to NULLs.
Or do I add an additional column or information to indicate which type of invoice it is?
EDIT
Simple models of the current tables below.
The dimension currently only contains the non migrated data, it has a primary key, however
if i merge the migrated invoice table in to this, it will appear as if the changes are being
made to the original invoices and not a second set of invoices
Dimension
surrogate_key| source_pk | Total | scd_from | scd_to
| | | |
1 | 1 | 100 | 01/01/2019 | 31/01/2019
2 | 1 | 150 | 01/02/2019 | 31/12/2019
3 | 2 | 50 | 01/01/2019 | 31/12/9999
source invoice table
pk | Total
___________________
1 | 150
2 | 50
source migrated invoice table
pk | total
___________________
1 | 200
2 | 300

If invoice and migrated invoice have same natural key but some of the fields have different values (your example shows Total amount different between them), then you have one row based on the natural key in the Dim but 2 different columns to represent the 2 sources. Based on your example, you need invoice_Total and migrated_invoice_Total columns in your DIM.

Pulling ill-formatted data in Libre Calc: What Function will work with this?

I am working on a project where I am pulling tables from a Fandom Wikia page and feeding it into a spreadsheet named 'WikiPullSheet'. The data in the wiki tables is irregular in format; sometimes using multiple rows for the same entry.
Here is an example of some rows as described above from the sheet:
Name | Power | Stamina | Agility
Townsman Shield | 2 | 1 | 2
Starter | | |
Broken Shield | 4(+1) | 2(+1) | 2(+1)
Z1 | | |
Heater | 2(+1) | 4(+1) | 2(+1)
Z1 | | |
Wood Elf Shield | 2(+1) | 2(+1) | 4(+1)
Z1 | | |
Shiv | 4 | 4 | 3
Z1 Shop | | |
Deimos* | 26 | 16 | 26
| 34 | 22 | 34
I want the sheet to auto-update from the wikia page but this format will not allow me to reference items as the sheet expands. For instance, if on another sheet I want to have a drop down list of all the names for items in this list, I would be referencing the blank and starter cells even though they are not actually unique items in the table. I have done research on VLOOKUP, COUNTIF, REGEX options, MATCH, and more, but none of these seem to work for the issue I am having.
How would I take this input and either create a formula to reformat it or pull from the sheet as is and use the columns appropriately for a drop-down box containing only the item names from the NAME column?
Desired Output:
I need the data to end up formatted with each row representing a different unique item. Since the information is pulling with rows that contain location of the item in the name column (Z1 for instance), this is proving difficult. I could simply remove the rows that cause problems such as 'Z1' & 'Z1 Shop' in the above example, however this does not help when an item has multiple upgrade paths like in the case of the 'Deimos' row entry.

If you insert a pivot table (there is a icon to do so, select ColumnA first) based on ColumnA (assuming that is where Name is to be found) you should get something like:
It is far from a complete solution (you don't show what the desired output should be) but I thought a sorted list, with each entry unique and the blanks at least out of the way, might have been a start.

Cassandra/Solr data model improvement

I have the following table:
CREATE TABLE videos_tags (
id text,
tag text,
video text,
someotherfield long,
PRIMARY KEY (id),
) WITH gc_grace_seconds = 1296000
AND compaction={'class': 'LeveledCompactionStrategy'}
AND compression={'sstable_compression': 'LZ4Compressor'};
The table stores a list of tags and videos. A video can have one or more tags; and a tag can be attributed to more than one video. Example:
id | tag | video
------------------------------------------
1 | dancing | video1
2 | singing | video2
3 | prank | video3
4 | prank | video4
5 | funny | video3
6 | cover | video2
I want to show to my users a list of related videos based from tag assignment - the more tags a certain video has in common with the user's video, the more "related" it is. The actual approach that I use comprises of 2 steps:
Get a list of the user's video's tags
q=:&fq=video:video1&fl=tag
Identify the videos use the same tags as the user's video and select the top 10 (resultset slicing is done in application side)
q=:&fq=tag:tag1 AND tag:tag2 AND tag:tag3 AND !video:video1&fl=video&stats=true&stats.field=someotherfield&stats.facet=video
Note: I used stats instead of plain facet because I also need the sum of someotherfield
This approach yields an average execution time of 30 seconds. Unfortunately, the maximum acceptable query time for my app is 10 seconds
Is there a better approach to tackling this data requirement? I'm open to:
Alternative query approach (minor tweaks are preferred; but I can accept something as drastic as replacing my 2-step approach completely)
Alternative schema
Notes:
The actual schema has several other fields that I removed from this post for brevity
I do all read operations via Solr (Datastax Enterprise 4.6.0). Nothing fancy in the Solr schema
The table currently holds 1.5 billion rows, but could grow to double or triple of that within years (so the solution must take into account the table/index size)
No fulltext search - only exact string filters

How to merge two Excel sheets

I have an Excel document with 10000 rows of data in two sheets, the thing is one of these sheets have the product costs, and the other has category and other information. These two are imported automatically from the sql server so I don't want to move it to Access but still I want to link the product codes so that when I merge the product tables as product name and cost on the same table, I can be sure that I'm getting the right information.
For example:
Code | name | category
------------------------------
1 | mouse | OEM
4 | keyboard | OEM
2 | monitor | screen
Code | cost |
------------------------------
1 | 123 |
4 | 1234 |
2 | 1232 |
7 | 587 |
Let's say my two sheets have tables like these, as you can see the next one has one that doesn't exist on the other- I put it there because in reality one has a few more, preventing a perfect match. Therefore I couldn't just sort both tables to A-Z and get the costs that way- as I said there are more than 10000 products in that database and I wouldn't want to risk a slight shift of costs -with those extra entries on the other table- that would ruin the whole table.
So what would be a good solution to get the entry from another sheet and inserting it to the right row when merging? Linking two tables with field name??... checking field and trying to match it with the other sheet??... Anything at all.
Note: When I use Access I would make relationships and when I would run a query it would match them automatically... I was wondering if there's a way to do that in excel too.

Why not use a vlookup? If there is a match, it will list the cost. Assuming the top is sheet1 and the other sheet2 and they both start on cell A1. You just need this in cell D2.
=VLOOKUP(A2,Sheet2!A:B,2,0)
You can then drag it down. Easiest way to fill all your 10000 rows is to hover over the bottom left corner of the cell with your cursor. It will turn from a white plus sign into a thin black one. Then simply double click.

Just use VLOOKUP - you can add a row to your first sheet, and find the cost based on code in the other sheet.

Database relationships - 1:1 but not always?

Apologies for the fairly unhelpful title, if you have a better suggestion please feel free to edit it.
I'm using CakePHP and the bake functionality (I don't need to bake however).
What's the best way of achieving the following:
table schema:
table ranges
id | name | description
table images
id | range_id | picture
table info (here i am confused)
id | range_id | height | width | colour
Basically, one range may have many images (1:Many). I can show this fine.
Now, each range will have an entry in the info table (1:1) and some attributes about the range such as height, colour, width. But not always...
Let's say I have a range foo. foo has five images that all have the same height, width and colour. However, foo has one image that is a different size and a different colour.
When the attributes differ, I need to show this information with the respective image, rather than the ranges default information. So this image will need it's own entry in the info table.
Does this even make sense? Or am I going about this entirely the wrong way.
My application, in brief:
(If it helps, think of "range" as a product)
User selects a range
User views images in the range
User can click an image, and the information from info pops up about that range.
Some images have different attributes, but still belong to the same range.
How can I make this distinction and store it appropriately?
Please let me know if I can clarify further. Thank you.

I've needed to do this on occasion where a parent entity has a value that can get "overridden" by a child entity.
There are a couple of approaches you can take the structure being the easiest part.
consider the following structure
table ranges
id | name | description | default_info_id
table images
id | range_id | picture | info_id
table info
id | height | width | colour
When does image.info_id have a value? There are two choices
Populate the image.info_id with the default_info_id from the parent. The user can then override it on the image
Pros
You never need to look at the Range to figure out what the info is on the image
Cons
you need to decide what to do when the range.default_info_id changes. does it effect the images or is it just for future
Only Populate the images.info_id when its different than the parent.
Pros
If the parents.default_info_id changes when images.info_id is null it will automatically change as well
Cons
you need to decide what to do when the range.default_info_id changes. Do you need to now null out any images.info_id that are now the same as the parent?
You need to look at the rages table to figure out what the info_id is on the images when its null.
You can have several varieties of the above data structures but you'll still need to figure when to populate what. Here are two others you could consider that are valid (but less optimal in my opinion)
Info has an FK to both tables but one is always null
table ranges
id | name | description
table images
id | range_id | picture
table info
id | range_id | image_id | height | width | colour
No Info Table at all
table ranges
id | name | description | default_height | default_width | default_colour
table images
id | range_id | picture | height | width | colour

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight