DAL using typed dataset - dataset

Are there any performance issues in using Typed Data Sets as DAL? Is it a recommended approach? I am using it for listing purposes only (repeater). It has paging, sorting functionalities too.

It works through untyped DataSet and just incapsulates type casting.

A dataset includes a lot of information other than the data that you need in the list. Therefore, if you can read the data out in a different way it would give you less information to transfer and therefore better performance.
Having said that, depending on your app you may not notice the difference. Do whatever is easiest for you and then check the performance.

Related

Is it bad design to use arrays within a database?

So I'm making a database for a personal project just to get more than my feet wet with PostgreSQL and certain languages and applications that can use a PostgreSQL database.
I've come to the realization that using an array isn't necessarily even compliant (Arrays are not atomic, right?) with 1NF. So my question is: Is there a lack of efficiency or data safety this way? Should I learn early to not use arrays?
Short answer to the title: No
A bit longer answer:
You should learn to use arrays when appropriate. Arrays are not bad design themselves, they are as atomic as a character varying field (array of characters, no?) and they exists to make our lives easier and our databases faster and lighter. There are issues considering portability (most database systems don't support arrays, or do so in a different way than Postgres)
Example:
You have a blog with posts and tags, and each post may have 0 or more tags. The first thing that comes to mind is to make a different table with two columns postid and tagid and assign the tags in that table.
If we need to search through posts with tagid, then the extra table is necessary (with appropriate indexes of course).
But if we only want the tag information to be shown as the post's extra info, then we can easily add an integer array column in the table of posts and extract the information from there. This can still be done with the extra table, but using an array reduces the size of the database (no needed extra tables or extra rows) and simplifies the query by letting us execute our select queries with joining one less table and seems easier to understand by human eye (the last part is in the eye of the beholder, but I think I speak for a majority here). If our tags are preloaded, then not even one join is necessary.
The example may be poor but it's the first that came to mind.
Conclusion:
Arrays are not necessary. They can be harmful if you use them wrong. You can live without them and have a great, fast and optimized database. When you are considering portability (e.g. rewriting your system to work with other databses) then you must not use arrays.
If you are sure you'll stick with Postgres, then you can safely use arrays where you find appropriate. They exist for a reason and are neither bad design nor non-compliant. When you use them in the right places, they can help a little with simplicity of database structures and your code, as well as space and speed optimization. That is all.
Whether an array is atomic depends on what you're interested in. If you generally want the whole array then it's atomic. If you are more interested in the individual elements then it is being used as structure. A text field is basically a list of characters. However, we're usually interested in the whole string.
Now - from a practical viewpoint, many frameworks and ORMs don't automatically unpack PostgreSQL's array types. Also, if you want to port the database to e.g. MySQL then you'll
Likewise foreign-key constraints can't be added to an array (EDIT: this is still true as of 2021).
Short answer: Yes, it is bad design. Using arrays will guarantee that your design is not 1NF, because to be 1NF there must be no repeating values. Proper design is unequivocal: make another table for the array's values and join when you need them all.
Arrays may be the right tool for the job in certain limited circumstances, but I would still try hard to avoid them. They're a feature of last resort.
The biggest problem with arrays is that they're a crutch. You know them already and you want to use them because they're familiar to you. But they do not work quite like you expect, and they will only allow you to postpone a true understanding of SQL and relational databases. You're much better off waiting until you're forced to use them than learning them and looking for opportunities to rely on them.
I believe arrays are a useful and appropriate design in cases where you're working with array-like data and want to use the power of SQL for efficient queries and analysis. I've begun using PostgreSQL arrays regularly for data science purposes, as well as in PostGIS for edge cases, as examples.
In addition to the well-explained challenges mentioned above, I'm finding the biggest problem in getting third-party client apps to be able to handle the array fields in ways I'd expect. In Tableau and QGIS, for example, arrays are treated as strings, so array operations are unavailable.
Arrays are a first class data type in the SQL standard, and generally allow for a simpler schema and more efficient queries. Arrays, in general, are a great data type. If your implementation is self-contained, and doesn't need to rely on third-party tools without an API or some other middleware that can deal with incompatibilities, then use the array field.
IF, however, you interface with third-party software that directly queries the DB, and arrays are used to produce queries, then I'd avoid them in favor of simpler lookup tables and other traditional relational approaches.

More efficient to store text as file or in DB?

Imagine you're dealing with many strings of text that are about 10,000 characters long entered by users. Would it be more efficient to write those automatically onto pages or input them onto a table in a database? I hope that question is clear enough...
It depends on what sort of "efficiency" you're aiming for.
Here's what I mean:
will you be changing the content of your text strings?
what sorts of searches will you be doing?
when you extract the text do what do you do with it?
My opinion is that provided you're not going to change the content much, nor perform much analysis, you're better off with the database.
10k isn't particularly large, so either is fine. I would personally use the database, as it will allow you to easily search though.
Depends how you're accessing them, but normally using the FS would result in better performance. That's for the obvious reason the DB is another layer built on top of the FS, and using the FS directly, assuming no extra heavy processing (for example, have 100s of named files instead of one big bloated file ordered in a special order you need to parse), would save you the DBMS operations.
I'm wondering if SQLite would be the best of both worlds, or at least, the best database for that size of job.
The real answer her is what you're going to do with these strings.
Databases are meant to be able to quickly return specific records. If you're just going to SELECT * FROM Table and then concat it all together, there's no point in using a database.
However, if you have a relation between your data that you want to be able to search, then a database will likely be more efficient.
E.G., do you want to be able to pull up all the text records from a set of users on a set of dates? Find all records from users who match some records?
These kinds of loads will likely be more efficient than a naive implementation, and still probably faster than a decent one, even if it does avoid some access layers.
There are a lot of considerations. As others said - either approach would work fine for a small number of 10k rows (thousands).
But what's the rest of your app do? If it does everything in the database, then I'd be inclined to put this there as well; the opposite is true as well.
And how will you be selecting these? Do you need to do complex text searches? If so, a database might not be the best. Or, would you be adding new attributes, searching on those attributes - or matching them against data in other tables? In this common case a database would be better.
And if your data is really vast (many millions of 10k rows) and your performance requirements aren't terribly high - you may want to compress them and store them in the file system.
Lastly, how important is data quality? Given the features of a good database it's much easier to guarantee good data quality with a database.

DataSet, an old confusion

When should I use DataSet instead of DataReader?
When I must use DataSet instead of DataReader?
When should I use data in a disconnected fashion?
When I must use data in a disconnected fashion?
N.B.
I am not asking which is better. I need to know the appropriate scenarios of the use of DataSet. I am programming in .net for couple of years but I never seriously needed it.
One scenario, When you want to pass data from one layer to another layer of your application you could use dataset. For more information Dataset and DataReader
A DataSet holds all of the needed data records in memory, whereas a DataReader reads records from a data connection one record at a time.
DataSets are commonly filled with data using DataReaders.
Use a DataReader when you need a high-performance, forward-only reader.
Use a DataSet when you need to do something that requires all of the data to be present at once, such as serialization or passing data between tiers. However, as others have pointed out, using List<T> rather than a DataSet object provides better separation of concerns between tiers.
See http://articles.sitepoint.com/article/dataset-datareader and http://msdn.microsoft.com/en-us/magazine/cc188717.aspx for more info on this.

How to Handle Unknown Data Type in one Table

I have a situation where I need to store a general piece of data (could be an int, float, or string) in my database, but I don't know ahead of time which it will be. I need a table (or less preferably tables) to store this unknown typed data.
What I think I am going to do is have a column for each data type, only use one for each record and leave the others NULL. This requires some logic above the database, but this is not too much of a problem because I will be representing these records in models anyway.
Basically, is there a best practice way to do something like this? I have not come up with anything that is less of a hack than this, but it seems like this is a somewhat common problem. Thanks in advance.
EDIT: Also, is this considered 3NF?
You could easily do that if you used SQLite as a database backend :
Any column in a version 3 database, except an INTEGER PRIMARY KEY column, may be used to store any type of value.
For other RDBMS systems, I would go with Philip's solution.
Note that in my line of software (business applications), I cannot think of any situation where this kind of requirement would be needed (a value with an unknown datatype). Unless the domain model was flawed, of course... I can imagine that other lines of software may incur different practices, but I suggest that you consider rethinking your overall design.
If your application can reliably convert datatypes, you might consider a single column solution based on a variable-length binary column, with a second column to track original data type. (I did a very small routine based on this once before, and it worked well enough.) Testing would show if conversion is more efficiently handled on the application or database side.
If I were to do this I would choose either your method, or I would cast everything to string and use only one column. Of course there would be another column with the type (which would probably be useful for the first method too).
For faster code I would probably go with your method.

Evaluating HDF5: What limitations/features does HDF5 provide for modelling data?

We are in evaluating technologies that we'll use to store data that we gather during the analysis of C/C++ code. In the case of C++, the amount of data can be relatively large, ~20Mb per TU.
After reading the following SO answer it made me consider that HDF5 might be a suitable technology for us to use. I was wondering if people here could help me answer a few initial questions that I have:
Performance. The general usage for the data will be write once and read "several" times, similar to the lifetime of a '.o' file generated by a compiler. How does HDF5 compare against using something like an SQLite DB? Is that even a reasonable comparison to make?
Over time we will add to the information that we are storing, but will not necessarily want to re-distribute a completely new set of "readers" to support a new format. After reading the user guide I understand that HDF5 is similar to XML or a DB, in that information is associated with a tag/column and so a tool built to read an older structure will just ignore the fields that it is not concerned with? Is my understanding on this correct?
A significant chunk of the information that we wish to write out will be a tree type of structure: scope hierarchy, type hierarchy etc. Ideally we would model scopes as having parents, children etc. Is it possible to have one HDF5 object "point" to another? If not, is there a standard technique to solve this problem using HDF5? Or, as is required in a DB, do we need a unique key that would "link" one object to another with appropriate lookups when searching for the data?
Many thanks!
How does HDF5 compare against using something like an SQLite DB?
Is that even a reasonable comparison to make?
Sort of similar but not really. They're both structured files. SQLite has features to support database queries using SQL. HDF5 has features to support large scientific datasets.
They're both meant to be high performance.
Over time we will add to the information that we are storing, but will not necessarily want to re-distribute a completely new set of "readers" to support a new format.
If you store data in structured form, the data types of those structures are also stored in the HDF5 file. I'm a bit rusty as to how this works (e.g. if it includes innate backwards compatibility), but I do know that if you design your "reader" correctly it should be able to handle types that are changed in the future.
Is it possible to have one HDF5 object "point" to another?
Absolutely! You'll want to use attributes. Each object has one or more strings describing the path to reach that object. HDF5 groups are analogous to folders/directories, except that folders/directories are hierarchical = a unique path describes each one's location (in filesystems w/o hard links at least), whereas groups form a directed graph which can include cycles. I'm not sure whether you can store a "pointer" to an object directly as an attribute, but you can always store an absolute/relative path as a string attribute. (or anywhere else as a string; you could have lookup tables galore if you wanted.)
We produce HDF5 data on my project, but I don't directly deal with it usually. I can take a stab at the first two questions:
We use a write once, read many times model and the format seems to handle this well. I know a project that used to write both to an Oracle database and HDF5. Eventually they removed the Oracle output since performance suffered and no one was using it. Obviously, SQLite is not Oracle, but the HDF5 format was better suited for the task. Based on that one data point, a RDBMS may be better tuned for multiple inserts and updates.
The readers our customers use are robust when we add new data types. Some of the changes are anticipated, but we don't have to worry about breaking thing when adding more data fields. Our DBA recently wrote a Python program to read HDF5 data and populate KMZ files for visualization in Google Earth. Since it was a project he used to learn Python, I'd say it's not hard to build readers.
On the third question, I'll bow to Jason S's superior knowledge.
I'd say HDF5 is a completely reasonable choice, especially if you are already interested in it or plan to produce something for the scientific community.

Resources