Create a relational GWAS/Genomics database - database

I thought I'd ask before I try to build something from scratch.
Here is the type of problem I need to answer. One of our researchers comes to me and says "How many people in our data have such-and-such SNP genotyped?"
Our genetics data consists of several dozen GWAS files, typically flat delimited. Each GWAS file has between 100,000-1,000,000 SNPs. There is some overlap in the SNPs, but less than I'd originally thought.
Anyway, what I want to do is have some sort of structured database that links our participant IDs to a particular GWAS study, and then link that GWAS study to a list of SNPs, and I can write some kind of query that will pull all IDs that have the data. At no point do I need individual level genotype data, it is way easier to pull the SNP/Samples that I need once I know where they are.
So that is my problem and what I'm looking for. For anyone who works with a lot of GWAS data, I'm sure you're familiar with the problem. Is there anything (free or paid) that is built for this type of problem? Or do you have thoughts on what direction I might want to go if I need to build this myself?
Thanks.

Related

Should tables be dynamically created?

I'm making an application for collectors where users will upload lists of stuff they want to collect
For example, someone wants to collect flowers and uploads a list of some flowers he'd like to collect which may look something like this:
Rose, Chrystanthemums, Narcissus
Then they may choose which ones they have and work towards their goal.
Of course, users would be able to upload all kinds of different lists which brings up the question on how should this data be saved and accessed.
An approach i thought of would be to dynamically create a table everytime a user uploads a list, but upon looking it up it's a practice that's generally frowned upon and people usually suggested other alternatives. However i can't quite think of an alternative for my situation.
In this question on DBA stack exchange the reply was that there can be a few rare cases where this is a good practice.
Is my case one of those?
How should i go about designing it?
Also i understand that i'm not providing many details about this problem and i'm not asking for you to design this for me. I'm just asking for some general guidelines or a direction to move to.
Thank you in advance!
Hello #aMimikyu for an example as simple as the one you mention dynamic tables won't be a help, on the contrary migth degrade the performance of your software, as you can use a single table for storing the users list and then use a column of the table to identify the type of list the user is saving. But in my opinion, there is a case when the dynamic tables migth be usefull: if the entities (abstract representation of data) cannot be managed by the same class (model) on every different type of input. In this case the models and tables can be created on the fly.

Thought on creating a searchable database

I guess what I need is two things. First a way to input data into an Excel like application or a form builder, then a way to search those entries. For example.. CAR PART put a car Part A into Field 1 the next Field 2 would be car Type, followed by make and model. The fields would need to be made into a form consisting of preset inputs such as ( Title/Type ) and (Variable Categories) so a drop down menu, icons, or checkboxes would help narrow down the list of results. What pieces need to be in place to build/use a lightweight database/application design like this that allows inputting new information and then being able to search for latet search for variables? Also is there any application that does this already, a programming code to learn, or estimated cost and requirements to have it built?
First, there might be something off the shelf that does this already, and there are applications like this. Microsoft's Access would be a good place to start to see if it would fit your needs -- you can build forms and store data without much programming effort. As time goes on, you can scale up to a SQL Server.
It's not clear to me if your data is relational or not, and it might not matter much at first (any database will likely handle your queries to start). I originally thought your data was not relational, but re-reading your post, I'm not so sure now.
If that doesn't work, or you want more flexibility, then I'd start looking at NoSQL as an option. Some good choices include Mongo and RavenDB (there are many others).
You can program it yourself with just about any major language -- some provide more or less functionality based on the tie-in to the data.

SQLite database questions, problems with design (indexing/multiple fields)

I use stackoverflow a lot, but this is my first question here, so if i'm doing anything wrong just let me know. I'm not a programmer (I just do programming for my own needs) so I'm open to tutorial suggestions etc. I won't be offended if you just give me something to read and find the answers myself.
OK, to the point - I'm trying to write simple application to track my personal expenses and I have a problem with database design. I'm using VStudio to create the database (SQLite). I attached a diagram with my design and I have some questions.
My SQLite diagram
I don't know exactly how to design "Transactions" table. Fields like Date, Payment Type etc. seems to be easy enough but the idea was to store in this table information about transactions so I need to store multiple products there. I've read about it and created table "Transactions_Products" that will help with that. My problem is : where do I put quantity of products in the transaction? I can't think of a place to put it. I tried to find similar databases but couldn't find anything.
Second thing. I've read about indexing a lot, but I still can't grasp the idea. I don't know when to use it. Should I use it only on fields that I will be "querying" a lot?
Last one - is it better for such a small application just for myself to store my account balance in a separate table or should I just calculate it every time?
As I said, I don't need answers like: "do this, do that". If you just give me some good tutorials/articles I think I can find answers on my own, but I couldn't find it. Maybe I'm searching for it wrong.
Thank you in advance for any information.
where do I put quantity of products in the transaction?
Transactions is a bad table name as it's vague and has multiple meanings. Consider "payments", "purchase invoices", etc. See https://dba.stackexchange.com/questions/12991/ready-to-use-database-models-example/23831#23831 for some existing patterns.
Should I use [indexes] only on fields that I will be "querying" a lot?
There's no free lunch. Indexes take space, and can slow down inserts. Start with indexes on your primary keys (which is the default for SQLite), measure what is slow (looking at query plans) and add indexes if they help and if you have room.
is it better for such a small application just for myself to store my account balance in a separate table or should I just calculate it every time?
For an operational/transactional database like you describe, avoid storing calculated values. SQLite can count numbers quickly :)
Premature optimization is premature. Make it work first with full normalization. If you have performance problems, analyze what is really causing the slow-down and go from there.

Game Design and Architecture Advice for Text Adventures

I am trying to create an old-school Text Adventure Game. I'm a bit stuck on creating the World Map and rooms.
Should the room descriptions be part of the source code or should it be separated out? I was thinking of placing all such descriptions and room properties in a MySQL database and then have code to organize the logic of each room; putting each room description in with the actual source code seems a bit untidy.
Is this the preferred method of organising Descriptions in an adventure game? I was also thinking that this might be preferable since I could then query the database to find common properties about the data.
Any comments would be appreciated.
No, don't include level/room description within code, it is not dynamic this way.
Many many development frameworks now tend to go with separating code from data. So, for usual cases, we put game rooms data within files and read those to build the level and maybe enable the user to construct a new level on his own and eventually create a new file to carry the room data.
I work in a company where they build games, and they have the rooms separated from the code, they have it in mysql. Actually also the items that go in each room are in a table, and there is also a table that says which item is at which room at that moment.
Besides if you want to expand your game or do statistics about it is much better doing it with a database.
I will address two issues here. First, you are right to keep the data that defines the game away from the engine that will use it. This makes it so that you dont have to recompile everything in order to fix a typo or the like in the case of a text based game.
Secondly though, I would just question the use of MySQL. If you are making a dos typed game that is to be installed on people's systems you dont want a pre-req to be 'Install MySQL', hehe. There is a little program out there that is written in C that is free for all to use called SQLite that would suit your needs much better. If on the other hand the web is the medium for the release of this text based game, then have at it :)
You could just use a system like ADRIFT, then all you need to worry about are the descriptions and logic.
Should the room descriptions be part of the source code or should it
be separated out?
Separated out.
Try Prolog language.
It has similar database to SQL (actually logical predicates)
With some skill You may be able to check whether after some change is Your adventure finishable.
You may easily create this description by some logical predicates if You don't mind it being very "computer like".
You can see examples of Prolog text adventures in simple Google search.
I suggest using engines that already have a vibrant community around them. That way, your source code is only that; the source code of the game. I'd go with either TADS 3 or Inform 7
I would construct such a game as an interpreter which reads in room data, and based on the room data, allows for a set of valid commands (move, take, drop, change...). For movement you would have a pre-built graph with nodes being rooms and edges being allowed moves.
I would separate the descriptions from the code, having an object Room, that owns an object Description that calls a "database" through some Facade, so that you may use a file or a database or anything you wish. It would also eventually allow you to add some scripting to the room itself, like having objects in your description that have behaviors.

Practical to save thousands of data structures in a file and do specific lookups?

There's been a discussion between me and some colleagues that are taking the same class as me (and thus have the same project) about saving data to files and read from those files only when we need that specific data.
For instance, the project is something about managing a social network. I'm not going into specifics because it doesn't matter, but the idea is to use the best data structures to manipulate this data.
Let's say I'm using an Hash Table to save the users profile data. Some of them argue that only some specific information should be saved in the data structures, like and ID that represents an user. Everything else should be put on files. We should access the files to get that data we want when we want.
I don't think this is practical... It could be if we were using some library for a database like SQLite or something, but are not and I don't think we are supposed to. We are only supposed to code everything ourselves and use C functions, like these. Nor do I think we are supposed to do a perfect memory management. The requisites of the project are not for us to code a database, or even a pseudo-database. What this project demands of us, are the best data structures (as long as we know how to justify why we picked those instead of others) to store the type of data and the all data specified for the project.
I should let you know that we had 2 classes before where the knowledge we got there is to be applied on this project. One of those dealt with the basis of C, functions, structures, arrays, strings, file IO, recursion, pointers and simple data structures like binary trees and linked lists, stuff like that. The other one was about more complex data structures, hash tables, AVL trees, heaps, graphs, etc... It also talked about time complexity, big O notation and stuff like that.
For instance, let's say all I have in memory is the IDs of the users and then I need to find all friends of a specific user. I'll have to process the whole file (or files) finding out the friends of that user. It would be much easier if I could have all that data in memory already.
It makes no sense to me that we need to pick (and justify) the data structures that we best see fit for the project and then only use them to lookup for an ID. We will then need to do a second lookup, to get the real data we need, which will take it's time, won't it? Why did we bother with the data structures in the first place if we still need to get to search a bunch of files on the hard drive?
How could it be possible, using standard C functions, coding everything manually and still simulate some kind of database? Is this practical at all?
Am I missing something here?
It sounds like the project might be more about how you design the relationships between your data "entities," and not as much about how you store them. I don't think storing data off in files would be a good solution - file IO will be much slower than accessing things in memory. If you had the need to persist data on the disk, you'd probably want to just use a database, rather than files (I know it's an academic course though, so who knows).
I think you should focus more on how you design your data types, and their relationships, to maximize the speed of lookups, searches, etc. For example, you could store all the users in a linked list, or store them in a tree, or a graph, but each will have its implications on how fast you can find users, etc. Depending on what features you want in your social networking site, there will be different designs that will allow different types of behavior to perform better than it would in other designs.
From what you're saying I doubt that you need to store anything on disk.
One thing that I would ask the teacher is if you're optimizing for time or space complexity (there will be a trade off between these two depending on what you're trying to achieve).
That can certainly be done. The resource forks in Mac System 5-8 files were stored as binary indexed databases (general use of the term, don't think SQL!). (I think the interface was actually written in assembly, but I could do it in c).
The only thing is: it's a pain in the butt. Such files typically need to start with some kind of index or header, and then hold a bunch of records at predictable locations. (OK, sometimes the first index just points at some more indexes. How many layers of indirection do you care to manage?)
If you're going to do it, just remember: binary mode access.
Hmm... what about persistent storage?
If your project requires you to be able to remember friend data between two restarts of the app, then don't you think file storage (or whatever else becomes an issue)?
I'm having a very hard time figuring out what you are trying to ask here.
But there is a general rule that may apply:
If all of your data will fit in memory at once, it is usually best to load all of it into memory at once and keep it there. You write out to a file only to save, to exit, or for backup.
There are lots of exceptions to this rule, but for a class project where this is going to be the only major application running on the machine, you may as well store everything in memory. After all, you have already paid for the memory; you don't want it just sitting there idle.
I may have completely misunderstood the question you are trying to ask...

Resources