Structuring the table of a multi-lingual dictionary database - database

I want to structure the table(s) for the database of a multi-lingual dictionary (English - Marathi). Marathi is a regional language in India.
The format of the dictionary is:
word | english_meaning1 | marathi_meaning1 | english_meaning2 |
marathi_meaning2 ... english_meaningN | marathi_meaningN
Words can have variable number of pairs of english and marathi meanings depending upon whether it belongs to any of the lexical categories (Noun, Adverb, Verb, Adjective etc.)
Currently I have thought of an inefficient approach of creating a table like this:
Table: word
word_id
word
english_meaning1
marathi_meaning1
english_meaning2
marathi_meaning2
english_meaning3
marathi_meaning3
english_meaning4
marathi_meaning4
.
.
.
.
english_meaning10
marathi_meaning10
Here I am assuming a fixed number of columns (20) for english and marathi meanings for a word in English. So if a word has only a single meaning (in English & Marathi), the rest of the columns will remain empty.
Also, if it's a word for example: 'about', which in the dictionary is shown as:
about1 - meanings about2 - meanings
Then I'm maintaining them as separate rows in the above structured table.
Isn't this approach problematic? Can this be solved by normalizing it? I have thought of a way
where the tables will be:
Table: word
word_id
word
Table: word_english
id
word_id (FK from word table)
english_meaning
Table: word_marathi
id
word_id
marathi_meaning
I am not pretty sure whether the above approach makes sense. Could anyone suggest a possible solution?? Thanks in advance!

ooof. definitely normalize
word
---------
word_id
word
definition
language_id
lexical_part_id
language
-----------
language_id
name
word_word
------------
word1_id
word2_id
lexical_part
-------------
lexical_part_id
name
then fill in the word_word table with the equivalence map

Related

Many to Many Database Relationship Design - to enable Word Clouds

I'm relatively new to database design and struggling to introduce a many-to-many relationship in a SSAS Tabular model.
I have some 'WordGroup' performance data in one table, like so;
WordGroup | IndexedVolume
Dining | 1,000
Sports | 2,000
Movies | 1,600
... and so on
Then I have 'Words' contained within these 'WordGroups' sitting in another category table, like so;
WordGroup | Word
Dining | Restaurant
Dining | Food
Dining | Dinner
Sports | Football
Sports | Basketball
... and so on
I can't see Performance data (IndexedVolume) by 'Word' detail - only by the 'WordGroup' that it is contained within. For example above, I can't look at 'Football' IndexedVolume on it's own, I can only choose the 'Sports' WordGroup that contains Football.
However, when analysing by 'WordGroup' I would still like users to understand what 'Words' are included (ideally in a Word Cloud Visualisation). Therefore, I wanted to develop a relationship between these two tables, so when someone chooses a Word Group (or multiple) we can return the Words that are contained within the Word Group(s) - i.e. below.
User selects Dining WordGroup
<<<Word Cloud or Flat Table would show Words below>>>
Restaurant
Food
Dinner
I looked at Concatenate / Strings etc, but was deterred as the detail here is much more complex and each WordGroup may contain 10+ Words, with translations.
Any advice would be greatly appreciated!
If analizing by WordGroup is an obligatory requirement, you sholud use these tables:
The many-to-many aplies beacuse your words may be conected to one or more groups, e.g. tree is conected to enviroment, forest, etc.
and obviously one word_group is conected to many words.
To see performance data by Word use :
select w.idword , w.name, sum(wg.index_volume)
from word w
left join word_group_has_word wgw
on w.idword=wgw.word_id
left join word_group wg
on wg.idword_group=wgw.word_group_id
group by w.idword
So you will see the sum of all the index volume of all the group_words conected to the words. ANd if you wanna see the words conected to the word groups use:
select distinct w.idword , w.name
from word w
left join word_group_has_word wgw
on w.idword=wgw.word_id
where wgw.word_group_id in [listWordGroupsId]

Enforcing a unique combination relationship in fields

Summery: I need any combination of [Field_1] and [Field_2] to be unique and for that uniqueness to be enforced. Note: This is not for permutations - and that's the difficulty.
In Depth:
I'm trying to track contacts for vendor software. I've set my DB up in the time old fashion such that a Vendor record may have many contacts. The trick is that contacts may be related to each other and may not be related to the parent vendor record. An example:
1. SuperBrokenSoftware is a tool who's vendor I need to contact all the time.
2. WeMakeBadSoftware is the Vendor
3. Fred works for WeMakeBadSoftware
4. Gale works for WeHelpPeopleWhenOthersWont
Let's say Gale is the appropriate contact to fix my issue with the SuperBrokenSoftware.
There is no way using the current hierarchy to track Gales relationship to SuperBrokenSoftware.
My solution is to keep track of these relationships in a table like so:
Field1 Field2 Field3
Fred Gale Gale handles specific issues for Fred
However given this solution Field_1 and Field_2 must be unique in combination. That is to say the records:
Field1 Field2 Field3
Fred Gale "Gale handles specific issues for Fred"
Gale Fred "Gale is awesome - Fred sucks"
Should be viewed as the same. Record 2 should not be allowed in the database because it is not unique.
What I have Tried:
Using the bijective - Szudzik's function: a >= b ? a * a + a + b : a + b * b; where a, b >= 0
I can calculate a unique identifier for every combination - but access cannot enforce uniqueness on a calculated field.
What is the best way to enforce a combination in Access?
Thanks in advance!!!
Create new field for unique identifier with unique index and create Before Change data macro, which should insert/change calculated identifier in new field.
Unique key can be just sorted concatenation of field1 and field2

Find all Records that do not have a Foreign letter in a Sql Table

I have been looking and doing research, and I am having trouble trying to split a table to two files, where one file only have English letters and special characters (such as ,.& () 0-9 - etc) and a second file that has all the records that have a foreign letter.
I have tried veriations of
SELECT * FROM TABLE WHERE Column_name like '%[a-zA-Z0-9]%'
but that would not get special characters
also
SELECT * FROM TABLE WHERE Column_name like '%[\040-\176]%'
The data looks like this (not actual Data)
Doha, The Black Pearl
Jefferson City & Wells
Wenston 89-100
St. Winchester (T)
Piñata Valley
Not süre how to Üse that U
I have 4000 records and want to quickly look through the table. I want all the records but the last two.
You're looking for REGEXP or RLIKE, not LIKE.

Vlookup array multiple columns

Excel wiz's,
I'm trying to build a report with a simple drop down list of names. Rather than try to explain in more detail, let me give you a sample dataset:
Table1:
Text Person1 Person2 Person3
String here contains name(s) Mike Smith Robert Johnson Suzy Q
Another string with name(s) Dan Boy John Michael Bob Wise
Different string with name(s) Robert Johnson Suzy Q
In my report sheet, I have a drop down list of all the possible "persons" that I want to chose from and then return all values from the "Text" column in an array. I have been able to make it work with only one column using this formula, where C4 contains my choice in the dropdown list:
INDEX(Table1[#All],SMALL(IF(Table1[Person1]=$C$4,ROW(Table1[Person1])),ROW(1:1)),1)
The text column will contain all the names of the Person columns, but they are in a different case (all caps, can't change format for display purposes). Maybe a SEARCH function would be more useful? I'm not sure. I'm trying to avoid using a macro, but I am not completely opposed.
Let me know what you guys think, and thanks in advance!
Simply re-organize your table so that there's one row per name... the V-Lookup on the name and get the matching list.
Person Text
Mike Smith String with names
Robert Johnson String with names
Suzy Q String with names
Dan Boy Second string with names
are you trying to make validations for teams? like select team, then next drop down gives only members of that team?
you can use offset inside validation. in one cell put a validation for the list of teams. in the other cell, create a list validation, use a offset formula to return the range of members based on the selected team.
edit: not sure how to put in a table, but this is how you would fill a range with vlookup
in the table with the entries, add a column with serial number starting from 1-n
just below the drop down box, enter numbers 1 to n in order
vlookup the serial number in the table, that is the row you are looking up
for the column, use a match to look in the table which column the current selected person is
drag the formula down to fill n numbers

Cassandra, implementing high-cardinality indexes

As it is known, Cassandra is great in low-cardinality indexes and not so good with high-cardinality ones. My column family contains a field storing URL value.
Naturally, searching for this specific value in a big dataset can be slow.
As a solution, I've come up with idea of taking first characters of url and storing them
in separate columns, e.g. test.com/abcd would be stored as (ab, test.com/abcd) columns.
So that when a search by specific URL value needs to be done, I can narrow it down by 26*26 times by searching the "ab" first and only then looking up exact url in the obtained resultset.
Does it look like a working solution to reduce URL cardinality in Cassandra?
If you need this to be really fast, you probably want to consider having a separate table with the value that you are searching for as the column key. Key prefix searches are usually faster than column searches in BigTable implementations.
A problem with that is that a sequential scan is going to have to follow after you use the low-cardinality index, in order to finally arrive at the one specific URL queried.
As Chris Shain mentioned, you can build a separate column family to build an inverted index:
Column Family 'people'
ssn | name | url
----- | ------ | ---
1234 | foo | http://example.com/1234
5678 | bar | http://hello.com/world
Column Family 'urls'
url | ssn
------------------------ | ------
http://example.com/1234 | 1234
http://hello.com/world | 5678
The downside is that you need to maintain the integrity of your manual index yourself.

Resources