My lab is doing a lot of sequencing, but the way the sequences are documented makes it difficult to retrieve them or keep track of the data. I would like to create a database that has following features:
-A Graphical user interface to allow one to upload/retrieve/view data, and can incorporate links to quickly BLAST or analyse the sequences with other online tools.
-allows one to access it in the command line
-that has another section on the GUI that has records of what's in the lab, what needs to be ordered etc.
I wanted to know if there are general database templates I can adopt and modify to suit my lab needs? I have no experience in database design but have read about mySQL.
What are the first steps I should take in embarking on this project?
Thank you!
This is an interesting question and problem domain (one I now have expierence with btw). Your first step is to decide on a general architecture and then select technologies for this.
For the web/graphical side, there are lots of off the shelf components (I assume you are aware of tools like AntiSMASH, JBrowse, etc). But you will need to evaluate these. That is way outside the scope of the db side however.
On the database side, PostgreSQL performs admirably here. I have worked on a heavily loaded 10+TB db which was specifically storing sequencing data, BLAST reports, and so forth. If you add stuff like PostBIS on top of that, you get something quite functional.
A lot of the heavier portions of the industry however are using Hadoop because of the fact that the quantity of data available is increasing very rapidly but the amount of expertise required to make that work is also appropriately higher.
I have inherited a complex, poorly documented SQL Server database and am trying to reverse engineer the tables, their relationships and in general how the data fits together. To do this I am drawing the table diagrams using Visio, attempting to determine the data relationships using the existing queries (there are no relationships actually defined) and using the application logic (what little there is) to help.
This is however quite challenging given the lack of documentation / obfuscated nature of this system and I was wondering if there are any techniques / approaches that can help?
Not an easy job, and the bigger it is the harder it is. I do however have one small trick for discovering the necessary surface area of the database - remove all user access!
This sort of database often runs with full access (i.e. users are DBO level or above) so anyone can do anything. If you remove all user access (but not your own!), then start application testing and enable minimum permissions on a per object basis, you will eventually reveal the surface area of the database (the exposed objects). This information is very useful in helping to identify dead objects - your job is hard enough as it is without having to document SPs that are no longer used for example.
How successful this technique can be depends on how thorough you think you can be in your testing, how sure you can be that someone somewhere doesn't have an Excel workbook that connects to the database once a year to run some query that no-one else ever does.
I'm trying to design a database for a small project I'm working on. Eventually, I'd like to make it a web-app, but right now I don't mind just experimenting with data offline. However, I'm stuck in a crossroads.
The basic concept would be a user inputs values for 10 fields, to be compared against what is in the database, with each item having a weighted value. I know that if I was to code it, that I could use a look-up tables for each field, add up the values, and display the result to the end-user.
Another example would be having to get the distance between two points, each point stored in a row, with the X value getting its own column as well as the Y value.
Now, if I store data within a database, should I try to do everything within queries (which I think would involve temporary tables among other things), or just use simple queries, and manipulate the rows returned within the application code?
Right now, I'm thinking to go for the latter (manipulate data within the app) and just use queries to reduce the amount of data that I would have to go sort through. What would you guys suggest?
EDIT: Right now I'm using Microsoft Access to get the basics down pat and try to get a good design going. IIRC with my experience with Oracle and MySQL you can run commands together in a batch process and return just one result. But not sure if you can do that with Access.
If you're using a database I would strongly suggest using SQL to do all your manipulation. SQL is far more capable and powerful for this kind of job as compared to imperative programming languages.
Of course it does imply that you're comfortable in thinking about data as "sets" and programming in a declarative style. But spending time now to get really comfortable with SQL and manipulating data using SQL will pay off big time in the long run. Not only for this project but for projects in the future. I would also suggest using stored procedures over queries in code because stored procedure provide a beautiful abstraction layer allowing your table design to change over time without impacting the rest of the system.
A very big part of using and working with databases is understanding Data modeling, normalization and the like. Like everything else it will be a effort but in the long run it will pay off.
May I ask why you're using Access when you have a far better database available to you such as MSSQL Express? The migration path from MSSQL Express to MSSQL or SQL Azure even is quite seamless and everything you do and experience today (in this project) completely translates to MSSQL Server/SQL Azure for future projects as well as if this project grows beyond your expectations.
I don't understand your last statement about running a batch process and getting just one result, but if you can do it in Oracle and MySQL then you can do it in MSSQL Express as well.
What Shiv said, and also...
A good DBMS has quite a bit of solid engineering in it. There are two components that are especially carefully engineered, namely the query optimizer and the transaction controller. If you adopt the view of using the DBMS as just a stupid table retrieval tool, you will most likely end up inventing your own optimizer and transaction controller inside the application. You won't need the transaction controller until you move to an environment that supports multiple concurrent users.
Unless your engineering talents are extraordinary, you will probably end up with a home brew data management system that is not as good as the one in a good DBMS.
The learning curve for SQL is steep. You need to learn how to phrase queries that join, project, and restrict data from multiple tables. You need to learn how to handle updates in the context of a transaction.
You need to learn simple and sound table design and index design. This includes, but is not limited to, data normalization and data modeling. And you need a DBMS with a good optimizer and good transaction control.
The learning curve is steep. But the view from the top is worth the climb.
I'm trying to populate a table with user information in a MS SQL database with information from multiple data sources (i.e. LDAP and some other MS SQL databases). The process needs to run as a daily scheduled task to ensure that the user information table is updated frequently.
The initial attempt at this query/ update script was written in VBScript and would query each data source and then update the user information table. Unfortunately, this takes very long to run and update the user information table.
I'm curious if anyone has written anything similar and if you recommend or noticed a performance improvement by writing the script in another language. Some have recommended Perl because of multi-threading, but if anyone has any other suggestions on ways to improve the process or other approaches could you share tips or lessons learned.
It's good practise to use Data Transformation Services (DTS) or SSIS as it has become known for doing repetitive DB tasks. Although this won't solve your problem, it may give some pointers to what is going on as you can log each stage of the process, wrap it in transactions etc. It is especially well suited for bulk loading and updates, and it understands VBScript natively so there should be no problem there.
Other than that I have to agree with Brian, find out what's making it slow and fix that, changing languages is unlikely to fix it on its own, especially if you have an underlying issue. As a general point my experience when using LDAP, which is pretty small, was it could be incredibly slow reader bulk user details.
I can't tell you how to solve your particular problem, but whenever you run into this situation you want to find out why it is slow before you try to solve it. Where is the slow down? Some major things to consider and investigate include:
getting the data
interacting with the network
querying the database
updating indices in the database
Get some timing and profiling information to figure out where to concentrate your efforts.
Hmmm. Seems like you could cron a script that uses dump utils from the various sources, then seds the output into good form for the load util for the target database. The script could be in bash or Perl, whatever.
Edit: In terms of performance, I think the first thing you want to try is to make sure that you disable any autocommit at the beginning of the load process, then issue the commit after writing all the records. This can make a HUGE performance difference.
AS MrTelly said, use SSIS or DTS. Then schedule the package to run. Just converting to this alone will probaly fix your speed issue as they have tasks that are optimized for bulk inserting. I would never do this in a script language rather that t-SQl anyway. Likely your script works row by row instead of on sets of data but that is just a guess.
I am having a real problem at work with a highly ingrained developer obsessed with ms access. Users moan about random crashes, locking errors, freeze's, the application slowing down (especially in 2007) but seem to be very resistant to moving it. Most of the time they blame the computer and can't be convinced it's the fact its a mdb sat on a network drive and nothing to do with the hardware sat in front of them which is brand new.
There is a front end vb program hanging off it but I don't think it would take more than a couple of weeks to adjust, infact I would probably re-write it as it has year on year messy code from a previous developer.
What are my best arguments to convince them we need to move it?
Does anyone else have similar problems with developers stuck in their ways?
how about the random, crashes, locking errors, freeze's, slow downs (sic).
A quick search on the web finds some useful materials:
Best Practices When Using Microsoft Office Access 2003 in a Multi-user Environment - if the changes here can't be implemented, or would effectively take a rewrite, then that is good ammunition for doing it right.
SQL Server vs MS Access - pay special attention to feature limits. Eg You can only have 32,000 objects in an access DB. Caveat: though it says 255 concurrent users, and that is probably a technical limitation, the practical limitation is really MUCH lower.
It's hard to convince people that are not willing to learn and are not open to new ideas. You can go on about speed issues, concurrency issues, security problems.. but ultimately, some people will just never listen. Go over their heads. Rewrite it in tools from this decade and show them up. Refuse to be involved with the project and further. I don't know what the political situation is, but technically, MS access is wrong for what you are doing, from what you've described.
come in on a weekend, copy the database to sql server, change the app's connect-strings to sql server, retest the application, then uninstall ms-access...everywhere.
then don't say anything about it, let him think that the problems 'fixed themselves' and that the users are still using ms-access
To me it depends on how many concurrent users you have and how big the database is. If you have more than 5 concurrent users then you should be thinking about a database server. The network traffic starts to get out of hand and with each concurrent user you add it just gets worse.
I have created reliable access based systems for years. If you are having random crashes, locking issues, and slow downs then you aren't doing something right. I typically will have an mda local with the mdb on the network when creating an app in access. To have good performance it's key to have the proper indexes and queries optimized for getting just the data you need. Whether using a separate app, access, or some app running against sql server you need to actively handle record locking properly. You can't just blindly let access lock your records.
Forget the arguments about DB Size, it is an uninformed reason to shift to a client-server platform in 90% of the cases I hear it brought up.
Your best arguments are based on features explained at a low tech level:
(1) You can backup and perform maintenance on the DB without kicking out the users (which introduces costly downtime).
(2) Faster recovery if data is accidentally deleted/mangled or corrupted. Again, less risk and less downtime. This is always a good foundation for a business case.
(3) If (and only if) you anticipate the need to scale quite a bit, the upgrade will better allow that.
(4) If you need to run automated jobs/updates, SQL can do this much more elegantly.
Remember the contra-indications for SQL, it is easy to get on your technical high-horse about this platform versus that, but you have to balance the benefits against the costs.
SQL is a Helluva lot more expensive to maintain as it requires dedicated hardware, expensive licenses (Server OS and DB) and usually at least a part time DBA that is going to cost you a bare minimum of $75K (if you get luck AND work out of Podunk Iowa).
The best possible advice I can give you is to make sure that you have a good attitude and are known as someone who does quality work and gets things done. It sounds like you don't have any control in the situation so what you need is influence.
Find a way to solve a problem (probably a different one that is less threatening to the people involved) in the way you are suggesting. Make it work blindingly fast and flawlessly. Make it work so well that people start asking for you when they need something done. Get it done quickly, which you should be able to do because you'll be using the right tools for the job.
Be a good person to work with, not the PITA that knows how everyone else should write their code. Be able to give an answer for what you might do differently and why, but don't automatically assume that your ideas are always the best. Maybe there are trade-offs that you don't know about -- no money in the budget for the extra CALs, we have this other app that needs to be done first. This doesn't sound like your situation, but looking for opportunities to understand before making constructive criticisms can go a long way to helping people be receptive.
The other thing is that this probably has nothing to do with the technical aspects of the situation and everything to do with the insecurities of the other developer. "This is all I know. If we change it, I won't understand it and then where will I be." Look for ways to help the other guy grow -- when he's having a problem, find resources that will help him develop good technical solutions. Suggest that everyone in your department get some training in new technologies. Who knows, one good SQL Server course and the guy could become the SQL Server evangelist in the organization because now THAT'S what he knows.
Lastly, know when to cut your losses, so to speak. If you find that you're not able to do anything about the situation, don't add to the complaining. Move on to something that you can control and do it as well as you can. Maybe in the future you'll be in a position that you do have control or influence in the situation and can do something about it. If you find that you're in a company that's more dysfunctional than most, find a way to move on to a place where the environment is better.
It is possible, and actually fairly easy, to convert an Access database to having the tables/views in SQL Server while still using the Access app as a front-end.
From there, your Access-obsessed developer can still have fun with all that VBA code. Meanwhile, on the back-end, you add indexes and such to speed everything up. Maybe someday you get lucky, and he asks about stored procedures. Then, the app is just a front-end, and who cares what it's written in? Your data is safe in SQL Server.
It is possible for you to do this yourself, but just leave the production app ALOOOOOOOOOOOONE. Take a copy, and convert that copy. Then, host it for a couple of users to TEST drive .. make your version of the Access app show "TEST APP" in big red letters. If your developer asks what you're doing, you can say the truth -- you are testing to see if converting only the tables/views might be of some help to the overall app.
This way, you get the best of both worlds, keep your developer happy, make the users happier (hopefully), and if you play it right, your bosses will know that you handled a knotty personnel issue with your technological prowess and your maturity.
I once had similar problems with someone I would not hesistate to call a complete idiot.
It was not possible to convince them of the issues with access. In the end it was easier to force the issue than do it "nicely", cruel to be kind.
If they resist then you can always go above their head. Management must be aware of crashes and stability related issues. Present a plan to them to improve stability and they are likely to at least listen. They will probably then want a meeting with all developers to discuss so go into it armed with plenty of ammo.
More than "How to convince them", let's talk about "How to do it without anybody noticing"!
First of all I advice you not to mix together the code optimisation issue and the SQL server one. Do not give users a chance to complain about SQL while bugs are related to something else.
If your code is really unbearable, rewrite the app before switching to SQL, keeping in mind the following points to make the final transition to SQL Server completely transparent for final users.
This is what we did 18 months ago, and I am sure we still have users thinking our database is Access:
Export current access database to SQL through available Wizard in access for testing purposes (many problems might occur, and you could need another tool such as the one proposed here).
Create a unique connection object at the application level, so that you can freely switch from Access to SQL at any time (at development level, you can even add an input box at startup to ask which connection to use). We chose an ADODB connection object, but it will also work with ODBC connection.
In case you use SQL syntax to update tables, make sure that all SELECTs, INSERTs, UPDATEs and DELETEs make use of this connection. In case you use recordset, make sure that all of them use this connection at opening time.
When needed, update all connexion specific code by adding a "SELECT CASE" type_Of_TheConnexion options
Switch to SQL connection ..and debug till you're done!
The problems you will find are mainly linked to SQL syntax, where MSSQL uses ' instead of " and # as separators. Date format is also an issue, where standard SQL format is 'YYYYMMDD' while MS-Access format depends on computer locals (beware of conversions from date to string!) and is stored as "YYYY-MM-DD" (if I remember ...). Boolean in SQL are 0 and 1, while they are True/False or 0/-1 in Access ...
Test, update code, and when you are ok, make a new data transfer, lock your app on the SQL connection, and distribute a new runtime.
It depends on the type of application and data load of your database but Access is quite efficient, even over the network.
Depending on the amount of data your users deal with you could easily scale up to a 100 users on a network just using a from and back-end Access database.
Looks like in your case a rewrite may be in order. If your application is data-centric if doesn't make much sense to develop it in VB6: the tools given by Access are much better than anything you'd be able to make, especially when considering Access 2007.
Upsizing to SQL Server is only really required if you're getting into issues of:
Security:
you need to make sure that only the rights users can access data. You can do your own security in Access, but it's never going to be as strong as SQL Server.
Scalability:
you're dealing with lots of data and complex queries or a lot of users and it would be better to have dedicated hardware to handle the load for the clients. The issue with this though is that while removing the pressure from the less-capable clients machines, you're adding a lot more to the server.
Integrity:
With the back-end database being just a file that needs R/W access for all connected clients, there's always the possibility that someone is going to do something bad or that a client may crash and leave the database corrupted.
If your number of users is average (I'd say 30), then there's probably no real need to upscale:
Use MS Access 2007 to develop your application, then just use the MS Access 2007 Runtime (it's free!) on all client machines to get a more modern user interface (uses the Ribbon and has lots of UI enhancements over previous versions).
You can't be the cheapness of that solution : you only need full retail version of MS Access and all the rest is free, regardess of the number of users!
Don't think that moving to SQL Server is going to improve performance of your queries: MS Access often does a better job of optimizing the queries for you (it knows what needs to be displayed and does lots of caching and optimization).
Make sure you only edit small amounts of data at any one time (don't use dynaset queries just to display vast amounts of data in a datasheet; use a snapshot instead and open a detail form that only contains the data to edit when necessary.
Cache complex queries locally.
Built some caching mechanism that leaves a copy of the results of a complex query on the local machine. The gain in performance is pretty amazing and if the query doens't change much (for instance a log of stock operations) you can just persist the complex/big query locally and append new records as necessary.
There is so much more to say.
Bottom line is: you may be looking at a rewrite, but don't dismiss Access as the solution because your current application was poorly written.
Try bechmarking and showing the stats to him
Making people change can sometimes be a real pain in the butt.
I would have to say the main argument would have stability and speed, but of course like you have said they already know this a still won't move.
Another thing to try would be to show them the power of LINQ to SQL and how much cleaner it would make your application. Like Daniel Silveira said you could try and throw a couple of stats there way and see if they are convinced.
We have a app build using MS access as a back end and I can't wait till we get our new SQL server so I can move everything to that.
You could show him the perf results comparing the two, but if he's really set in his ways and refuses to change, there isn't much you can do except force him somehow.
If you're his boss then just force him to change it to use SQL. If not, then convince your boss to force the change by showing him the perf results and explain it'll fix the issues you're having.
Errr, leave the team? You seem to be working with the totally wrong set of people. Now, if the team IS your company, then you are working with the wrong company.
Of course once you leave the company, you could tell your clients that you could solve the network problems on their own and make them leave the company as well. Then give them an improved system that works on SQL Server Express.