SQL database/column name convention + language code

SQL database/column name convention + language code - database

Usually in master tables we have a name column and it is usually in English.
But what is the best column naming convention if we have it in different languages. Is en_name better than name_en?
( keep in mind, i will use JPA and entities). For now, I will need the following columns:
en_name for English.
ar_name for Arabic.
fr_name for French.
de_name for German.
Also, which is better to use the two letters ISO 639-1 codes or three letters ISO 639-2 Code?

It depends on what you need. If you need just a separate columns to store in DB name translations, I'll recommend to use name_de, name_en ..., but it's just my preference, it's easy to read and understand purpose of column.
But think about filtering (searching). If you prefere this aprouch with additional columns in the same table, you will need to add some kind of switch in query (stored procedure) to search name for specific language - for English you need to search in name_en, German - name_de. So I could recommend solution with separate table for localization with columns id, [master_table]_id, language_code and text (or name).
For example, there is a table item with columns idand name.
I would create table item_localization with columns id, item_id, language_code and text.
So for row (1, "name") in table item in item_localization will be (1, 1, "en", "name-en"), (1, 2, "de", "name-de").
I prefere to store in master table English name and also duplicate it in localization table. In this case it's easy to filter by localized name, something like
select item.*
from item
join item_localization on item_localization.item_id = item.id
where item_localization.language_code = "de" and item_localization.text like "%some_text%"
But it's just my experience and thoughts.

Related

How to improve or index postgresql's jsonb array field?

I usually use jsonb field store array data.
for example, I want to store customer's barcode info, I will create a table like this:
create table customers(fcustomerid bigint, fcodes jsonb);
One customer has one row, all barcode info stored in its fcodes field, just like below:
[
{
"barcode":"000000001",
"codeid":1,
"product":"Coca Cola",
"createdate":"2021-01-19",
"lottorry":true,
"lottdate":"2021-01-20",
"bonus":50
},
{
"barcode":"000000002",
"codeid":2,
"product":"Coca Cola",
"createdate":"2021-01-19",
"lottorry":false,
"lottdate":"",
"bonus":0
}
...
{
"barcode":"000500000",
"codeid":500000,
"product":"Pepsi Cola",
"createdate":"2021-01-19",
"lottorry":false,
"lottdate":"",
"bonus":0
}
]
The jsonb array maybe store millions of barcode's objects with the same structure. Perhaps this is not a good idea, but you konw when I have thousands of customer, I can store all the data in one table, one customer has one row in this table, all its data store in one field, it looks very tersely and easy to manage.
For this kind of application scenarios, how to efficiently to insert or modify or query the data?
I can use jsonb_insert to insert one object, just like:
update customers
set fcodes=jsonb_insert(fcodes,'{-1}','{...}'::jsonb)
where fcustomerid=999;
When I want modify some object, I found it is a little difficulty, I should know the index of object first, if I use the incremental key codeid as the array index, things looks easilly. I can use jsonb_modify,Just like below:
update customers
set fcodes=jsonb_set(fcodes,concat('{',(mycodeid-1)::text,',lottery}'),'true'::jsonb)
where fcustomerid=999;
But if I want to query the objects in the jsonb array with createdate or bonus or lottorry or product, I should use jsonpath operator. just like:
select jsonb_path_query_array(fcodes,'$ ? (product=="Pepsi Cola")'
from customer
where fcustomerid=999;
or like:
select jsonb_path_query_array(fcodes,'$ ? (lottdate.datetime()>="2021-01-01".datetime() && lottdate.datetime()<="2021-01-31".datetime())'
from customer
where fcustomerid=999;
Thie jsonb index looks useful, But it looks useful between different row, and my operation mostly works in one row's one jsonb field.
I am very worrying about the efficiency, for millions of objects stored in one row's one jsonb field, is this a good idea? And how to improve the efficiency in this scenarios? Especially for the query.

You are right to worry. With a huge JSON like that, you will never get good performance.
Your data don't need JSON at all. Create a table that stores a single barcode and has a foreign key reference to customers. Then everything will be simple and efficient.
Using JSON in the database is almost always the wrong choice, judging from the questions in this forum.

Searching for particular combinations across two columns (nvarchar)

I am trying to search a table for entries that feature a combination of specific values across two particular columns.
I'm having no problem performing the search using one condition:
SELECT *
FROM Table
WHERE [artist_id] IN ('ID1', 'ID2', etc)
But I'd like to add a second condition, something like this:
AND WHERE [track_name] IN (NAME1', 'NAME2', etc)
A few notes:
"artist_id" and "track_name" are both formatted as nvarchar, with "track_name" taking the form of single words or phrases.
There are multiple entries for each "artist_id" and "track_name," but all combinations of the two are unique.
So, how can I combine these conditions into a single query?
Here's a snippet of the code:
SELECT *
FROM [Music].[dbo].[echonest_tracks]
WHERE [artist_id] IN ('AR03U0G1187B9B1D35', 'AR03U0G1187B9B1D35', etc)
AND [track_title] IN ('Location', 'Cape Vibes Got 'em?', 'Feeling Good (Instrumental Remix)', 'How my heart by you', etc)

I think this is what you are looking for since you are looking for combinations:
SELECT *
FROM [Music].[dbo].[echonest_tracks]
WHERE
([artist_id] = 'AR03U0G1187B9B1D35' AND [track_title] IN ('Location', 'Cape Vibes Got 'em?', 'Feeling Good (Instrumental Remix)')
OR
([artist_id] = 'AR03U0G1187B9B1D35' AND [track_title] IN ('How my heart by you', etc))

You are almost there.
SELECT *
FROM Table
WHERE [artist_id] IN ('ID1', 'ID2', etc)
AND [track_name] IN ('NAME1', 'NAME2', etc)
should do it.
As for the single quote within a query string, see this answer:
How do I escape a single quote in SQL Server?
You need to double up a single quote to have it within a query string.

Selective PostgreSQL database querying

Is it possible to have selective queries in PostgreSQL which select different tables/columns based on values of rows already selected?
Basically, I've got a table in which each row contains a sequence of two to five characters (tbl_roots), optionally with a length field which specifies how many characters the sequence is supposed to contain (it's meant to be made redundant once I figure out a better way, i.e. by counting the length of the sequences).
There are four tables containing patterns (tbl_patterns_biliteral, tbl_patterns_triliteral, ...etc), each of which corresponds to a root_length, and a fifth table (tbl_patterns) which is used to synchronise the pattern tables by providing an identifier for each row—so row #2 in tbl_patterns_biliteral corresponds to the same row in tbl_patterns_triliteral. The six pattern tables are restricted such that no row in tbl_patterns_(bi|tri|quadri|quinqui)literal can have a pattern_id that doesn't exist in tbl_patterns.
Each pattern table has nine other columns which corresponds to an identifier (root_form).
The last table in the database (tbl_words), contains a column for each of the major tables (word_id, root_id, pattern_id, root_form, word). Each word is defined as being a root of a particular length and form, spliced into a particular pattern. The splicing is relatively simple: translate(pattern, '12345', array_to_string(root, '')) as word_combined does the job.
Now, what I want to do is select the appropriate pattern table based on the length of the sequence in tbl_roots, and select the appropriate column in the pattern table based on the value of root_form.
How could this be done? Can it be combined into a simple query, or will I need to make multiple passes? Once I've built up this query, I'll then be able to code it into a PHP script which can search my database.
EDIT
Here's some sample data (it's actually the data I'm using at the moment) and some more explanations as to how the system works: https://gist.github.com/823609
It's conceptually simpler than it appears at first, especially if you think of it as a coordinate system.

I think you're going to have to change the structure of your tables to have any hope. Here's a first draft for you to think about. I'm not sure what the significance of the "i", "ii", and "iii" are in your column names. In my ignorance, I'm assuming they're meaningful to you, so I've preserved them in the table below. (I preserved their information as integers. Easy to change that to lowercase roman numerals if it matters.)
create table patterns_bilateral (
pattern_id integer not null,
root_num integer not null,
pattern varchar(15) not null,
primary key (pattern_id, root_num)
);
insert into patterns_bilateral values
(1,1, 'ya1u2a'),
(1,2, 'ya1u22a'),
(1,3, 'ya12u2a'),
(1,4, 'me11u2a'),
(1,5, 'te1u22a'),
(1,6, 'ina12u2a'),
(1,7, 'i1u22a'),
(1,8, 'ya1u22a'),
(1,9, 'e1u2a');
I'm pretty sure a structure like this will be much easier to query, but you know your field better than I do. (On the other hand, database design is my field . . . )
Expanding on my earlier answer and our comments, take a look at this query. (The test table isn't even in 3NF, but the table's not important right now.)
create table test (
root_id integer,
root_substitution varchar[],
length integer,
form integer,
pattern varchar(15),
primary key (root_id, length, form, pattern));
insert into test values
(4,'{s,ş,m}', 3, 1, '1o2i3');
This is the important part.
select root_id
, root_substitution
, length
, form
, pattern
, translate(pattern, '12345', array_to_string(root_substitution, ''))
from test;
That query returns, among other things, the translation soşim.
Are we heading in the right direction?

Well, that's certainly a bizarre set of requirements! Here's my best guess, but obviously I haven't tried it. I used UNION ALL to combine the patterns of different sizes and then filtered them based on length. You might need to move the length condition inside each of the subqueries for speed reasons, I don't know. Then I chose the column using the CASE expression.
select word,
translate(
case root_form
when 1 then patinfo.pattern1
when 2 then patinfo.pattern2
... up to pattern9
end,
'12345',
array_to_string(root.root, '')) as word_combined
from tbl_words word
join tbl_root root
on word.root_id = root.root_id
join tbl_patterns pat
on word.pattern_id = pat.pattern_id
join (
select 2 as pattern_length, pattern_id, pattern1, ..., pattern9
from tbl_patterns_biliteral bi
union all
select 3, pattern_id, pattern1, pattern2, ..., pattern9
from tbl_patterns_biliteral tri
union all
...same for quad and quin...
) patinfo
on
patinfo.pattern_id = pat.pattern_id
and length(root.root) = patinfo.pattern_length
Consider combining all the different patterns into one pattern_details table with a root_length field to filter on. I think that would be easier than combining them all together with UNION ALL. It might be even easier if you had multiple rows in the pattern_details table and filtered based on root_form. Maybe the best would be to lay out pattern_details with fields for pattern_id, root_length, root_form, and pattern. Then you just join from the word table through the pattern table to the pattern detail that matches all the right criteria.
Of course, maybe I've completely misunderstood what you're looking for. If so, it would be clearer if you posted some example data and an example result.

Database of common name aliases / nicknames of people

I'm involved with a SQL / .NET project that will be searching through a list of names. I'm looking for a way to return some results on similar first names of people. If searching for "Tom" the results would include Thom, Thomas, etc. It is not important whether this be a file or a web service. Example Design:
Table "Names" has Name and NameID
Table "Nicknames" has Nickname, NicknameID and NameID
Example output:
You searched for "John Smith"
You show results Jon Smith, Jonathan Smith, Johnny Smith, ...
Are there any databases out there (public or paid) suited to this type of task to populate a relationship between nicknames and names?

I'm adding another source for anyone who comes across this question via Google. This project provides a very good lookup for this purpose.
https://github.com/carltonnorthern/nickname-and-diminutive-names-lookup
It's somewhat simpler and less complete than pdNickName but on the other hand it's free and easy to use.

A google search on "Database of Nicknames" turned up pdNickName (for pay).
In addition, I think you only need a single table for this job, not two, with NameID, Name, and MasterNameID. All the nicknames go into the Name column. One name is considered the "canonical" one. All the nickname records use the MasterNameID column to point back to that record, with the canonical name pointing to itself.
Your two table schema contains no additional information and, depending on how you fill in the nickname table, you might need extra code to handle the canonical cases.

I just found this site.
It looks like you could script it pretty easily.
http://www.behindthename.com/php/extra.php?terms=steve&extra=r&gender=m
I just wish I could auto narrow this to english..

Another commercial name matching database is: http://www.basistech.com/name-indexer/
It looks quite professional (though potentially expensive).
They claim to support the following languages:
Arabic, Chinese (Simplified), Chinese (Traditional), Persian (Farsi / Dari), English, Japanese, Korean, Pashto, Russian, Urdu

Here is a github repo with csv of related names, and you can contribute back:
The first few lines show the format:
aaron,ron
abel,abe
abednego,bedney
abijah,ab,bige
abigail,ab,abbie,abby,gail
abner,ab,abbie,abby
abraham,abe,abram,bram
absalom,ab,abbie,app

There is a database out there called pdNicknames (found at http://www.peacockdata2.com/products/pdnickname/). It contains everything you need, at a cost of $500.

Similar format as Stan James's csv, but folded two ways for lookups:
Name to nickname: https://github.com/MrCsabaToth/SOEMPI/blob/master/openempi/conf/name_to_nick.csv
Nickname to name: https://github.com/MrCsabaToth/SOEMPI/blob/master/openempi/conf/nick_to_name.csv

To select similar sounding name use: (see MSDN)
SELECT SOUNDEX ('Tom')

This is a good choice: https://github.com/onyxrev/common_nickname_csv
id, name, nickname
1, Aaron, Erin
2, Aaron, Ron
3, Aaron, Ronnie
4, Abel, Ab
5, Abel, Abe
6, Abel, Eb
7, Abel, Ebbie
8, Abiel, Ab
9, Abigail, Abby
10, Abigail, Gail

Searching for and matching elements across arrays

I have two tables.
In one table there are two columns, one has the ID and the other the abstracts of a document about 300-500 words long. There are about 500 rows.
The other table has only one column and >18000 rows. Each cell of that column contains a distinct acronym such as NGF, EPO, TPO etc.
I am interested in a script that will scan each abstract of the table 1 and identify one or more of the acronyms present in it, which are also present in table 2.
Finally the program will create a separate table where the first column contains the content of the first column of the table 1 (i.e. ID) and the acronyms found in the document associated with that ID.
Can some one with expertise in Python, Perl or any other scripting language help?

It seems to me that you are trying to join the two tables where the acronym appears in the abstract. ie (pseudo SQL):
SELECT acronym.id, document.id
FROM acronym, document
WHERE acronym.value IN explode(documents.abstract)
Given the desired semantics you can use the most straight forward approach:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
joins = []
for id, abstract in documents:
for word in abstract.split():
try:
index = acronyms.index(word)
joins.append((id, index))
except ValueError:
pass # word not an acronym
This is a straightforward implementation; however, it has n cubed running time as acronyms.index performs a linear search (of our largest array, no less). We can improve the algorithm by first building a hash index of the acronyms:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
index = dict((acronym, idx) for idx, acronym in enumberate(acronyms))
joins = []
for id, abstract in documents:
for word in abstract.split():
try
joins.append((id, index[word]))
except KeyError:
pass # word not an acronym
Of course, you might want to consider using an actual database. That way you won't have to implement your joins by hand.

Thanks a lot for the quick response.
I assume the pseudo SQL solution is for MYSQL etc. However it did not work in Microsoft ACCESS.
the second and the third are for Python I assume. Can I feed acronym and document as input files?
babru

It didn't work in Access because tables are accessed differently (e.g. acronym.[id])

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight