Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
a colleague of mine uses Excel to merge and analyse datasets (~10k lines).
Her spreadsheets are mazes of vlookup and nested if formulas.
How can I convince her to take a look at databases?
What would be a good way to start? I'm an sqlite fan, but wonder whether the entry threshold to Access is lower?
Are there any books that you'd recommend to get started? I checked this SO question What's a good book for introduction to databases for web developers - any additions to the list there?
Thanks,
Simone
re: How can I convince her to take a look at databases?
show her why your way is better.
redo what she did in Excel with your preferred tool and the same input data and see if you can find differences in the output.
Also, after both systems are set up, run them side-by-side for awhile noting performance and maintenance differences. If she agrees your way is better, she might decide to use it.
Not a direct answer to your question but as a developer who has done extensive work on data analysis in Excel a few observations.
If the primary goal is data analysis then using Excel might be good enough.
Specially if the different data sets (you mentioned merging) are provided as csv files - as and when required - going through the 'hassle' of first importing data into a sql database and then running queries to extract data for the analysis step might be too much.
Excel gives you the flexibility of playing around with your data, very easily trying different things, charting, pivot tables etc. If the reports that your friend needs are more or less static with only the data varying, then maybe a simple Access/SQL database with a small application on top would be a better solution. But then again, if this is the case, your friend probably has an Excel sheet with all the relevant formulas where only the data needs to be plugged in.
For most of my data-analysis in Excel the only real thing I have missed is the ability to gather data using foreign keys. Once you have that covered with vlookup, the rest of the analysis is usually quicker/easier in Excel.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am old school and new to MVC. I like the MVC that's more focus on action. But MS has bundled it w/ EF together, it's hard to not to use EF nowadays.
This is what I thought:
1 - RDBMS stored proc/package suppose have better performance over LINQ. For instance, SQL Server Transact-SQL supports paging now, compared with LINQ, TSQL definitely has better performance. So LINQ is only good for people don't know TSQL or PL/SQL in terms of performance.
2 - I've tried using LINQ w/ stored proc. Thought it works but has many limitation, for instance, .dbml files are strongly-typed, it prohibited any attempts of re-formatting the data, such as adding an anchor to a field of display. Well, one might say you're not supposed to do so. Let me give an example, business wants to make a column clickable in a grid. There are a number of ways to implement, one of the quickest is to embed an anchorto the column returned from a stored proc, very little change on the UI. Hence QA just needs to test a few. But using EF as the foundation, anything based on this model/class must QA again.
3 - Model-first or Code-first wouldn't get a nice normalized large-scale database implementation, this is because if a developer doesn't know TSQL he wouldn't be good at RDBMS design.
4 - This is the most important issue: in an enterprise environment, we developers can NOT dictate schema and table definitions. Even with DB-first approach, sometimes we don't even know where it comes from. But that's what EF is good about, right? You might say. Imagine EF detects all schema and what returned from stored proc, then builds all data-layer/class for me. Great, but there is a need for a real-time median price which is not in the database at all, we add it in w/ some customization code. It will be gone if another scan and detect is needed because our client requests something causes a tiny change in the database. How do we avoid this hassle of losing customization code?
5 - Sometimes we need to run "update-database" command in the package console so EF can work. It's almost impossible to explain to Operation and DBA that they are harmless during release.
However, as EF is getting popular there must be a new-school way to make it work. Can some experts educate old-school please?
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've been researching on SQL Cursors recently and a colleague of mine said that Cursors are best used for auditing. I tried to look for materials over in the internet but no luck.
Can anyone explain why Cursor is good for auditing despite its disadvantages?
Like any task, it's about picking the right tool for the job. Some disparage the use of cursors due to obviously bad examples of their use, but cursors have their place. They are particularly useful for subsetting data and for reducing code redundancy:
Primarily, I use cursors to perform tasks on subsets of very large datasets, ie, banking data. With billions of records there are some operations you wouldn't want to do all at once, so looping through by day is a good option. There are other methods of iterating through subsets, but a cursor performs well at this task, it's still set-based operations, just on smaller sets.
Cursors are also great for looping through multiple tables/fields in a database, no need to re-write a procedure for multiple tables if it's going to be doing the same thing in each table, or if you are consistently working on a variety of databases. For example, I had need to analyze a multitude of various log files generated by multiple systems, but they all had date and ip fields. Trivial to have a cursor loop through each of the tables and combine all relevant data into one spot.
I wouldn't use a cursor to perform row by row actions unless necessary, and while I can't think of a use-case off the top of my head I'm sure they exist.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Am building a website which will have articles, policies and laws and text stuffs. I am storing all the data (in some cases the articles with over 8000 characters) in MSSql 2008 database. I read some articles where they are saying text data should not be stored in databases. Where should they be stored? in .txt files or something? I also want to search through the data. If they are stored in DB i can use stored procedures etc. If stored in docs, i would need to use tools like Lucene. Am i right? Is my approach of using DB wrong for this project? Please enlighten me.
You will be using a DB of some description for this project no matter how you look at it, whether it be:
1) An old fashioned flat file database (txt documents, not recommended for large scale projects imho)
2) A traditional text storing database
3) A database of documents
The argument whether to use a DB of text or a db of documents depends on which skills/knowledge you possess or are likely to get access to (or assistance with). It sounds to me like you are more comfortable with a DB of text and in my opinion there is nothing wrong with that - worst case scenario if there ends up being a genuine need for documents to be used in the long run rather than straight text storage you should be able to generate the documents automatically from a text database - I suspect doing the reverse would be a lot more tricky (converting a load of proprietary documents to text for storage and insertion). Generating a plain text file from a text databse is trivial, and most vendor document formats support the importing of plain text documents for subsequent formatting.
For a large project like this you really need to spend some time considering what your documents are likely to be used for and by whom, and what methods best match them. If you are providing a database for people that heavily use MS Word and want to download your data you probably need to consider using a document DB. If it's just the information you want to provide (and web-based tools) you want to consider how you want to manipulate your own data.
This is all opinion obviously, but my last advice would be make sure you use utf-8 text from the outset if you go down the text route (bitter experience).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have several SQL Server 2005 databases ranging from 20 – 600 tables in an application and no documentation. I am looking for a database diagramming tool that is smart enough to pick tables that seem to be related to one entity (e.g., tables related to Patient, tables related to Orders) or one functionality (e.g., Patient Management, Order Management) and show them separately instead of drawing the entire database.
In the past, I have seen tables related to one piece of functionality represented in one color in the ER diagrams. In a well designed database, perhaps there will be multiple schemas that group tables related to one functionality together. But as all these tables are in one schema, and I want a tool that is smart enough to perhaps suggest which tables should go together under one schema. It won’t be perfect but perhaps it is intelligent enough to examine which tables should go together (for example based on relationships between them or based on which tables seem to be accessed together in the stored procs).
The bottom line is that I want to understand the data-model as quickly as possible. A tool called Schema Spy ( http://schemaspy.sourceforge.net/ ) seem to be headed in the right direction, but I was wondering if anyone knew better/more comprehensive tools.
Thanks.
Have you tried Visio at all? While it does not satisfy everything you asked for it can reverse engineer a database and make very appealing diagrams with a little work.
I have never used it to understand an existing database, but I have used it to explain databases I have created.
You could have a look at wsSqlSrvDoc. It's a nice little tool that works with Sql Server extended properties and creates a MS Word document.
The print-out of all column properties (with foreign key relations) works out of the box. For further descriptions on each field you have to set up extended properties of those columns in Sql Server Management Studio.
The downside however is that it's not free (but quite afordable). And if you just need to create a documentation for a "not work in progress" DB that's more or less finished than it would be enough to use the free trial i'll guess.
This question is related to an older question, Link:A good database modeling tool?
From the answer to this question, e.g. fabFORCE.net dbDesigner might be what you are looking for.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
How, if you have a database per client of a web application instead of one database used by all clients, do you go about providing updates and enhancements to all databases efficiently?
How do you roll out changes to schema and code in such a scenario?
It's kinda difficult for us. We have a custom program that writes a lot of the sql code for the different databases for us. Essentially it writes the code once and then copies it over and over again along with placing the change database commands etc. It also makes sure that the primary key identities etc are in sync when they need to be. Beyond that I would look at Red Gate's products. They have saved us more than once here. With them you can easily compare the dbs and see what is differnt. A must when dealing with multiple copies.
Use a code generator / scripting language to implement the original schema and updates to it over time.
I've used Red Gate's SQL Packager for this in the past. The beauty of this tool is that it creates a C# project for you that actually does the work so if you need to you can extend the functionality of the default package to do other things like insert default values into new columns that have been added to the db etc. In the end you have a nice tool that you can hand to a technician and all they have to do to upgrade multiple DBs is point it to the database and click a button.
Red Gate also has a product called SQL multi-script that allows you to run scripts against multiple servers/dbs at the same time. I've never used this tool but I imagine if you're looking for something to use internally that doesn't need to be packaged up you'd want to look at that.