Database Design for enterprise hardware stock and billing - database

I have to develop a database for an enterprise to manage its pc hardware and software stocks, PC breakdowns and pc assignment to users.
I'm a beginner to database design* so I'd like some advice on how to model the database properly from scratch.
I've already done some research and started something, but since you guys are experts, I'd like your take on it, please.
Details are not important, what I need is to know what tables you would make, and their relationships.
I've included a screenshot of the ERD I've come up with. Please have a look.
Here is what's I've done so far. I know it may be poorly design, keep in mind it's a first for me.
Basically, I have a list of users.
let's go for a user table.
Each user can have many pcs (but one pc for one user only).
so, PC table, with FK to users.
A PC consists of many types of identified hardware from a set list of MB, CPU, HDDs, RAM, GFX cards.
Here I made a category table for hard type (cpu, gfx, etc..)
Then make and model tables.
Finally a mode detail table to store all sorts of details like frequencies, sizes, etc..
I link those tables to a hardware table with FKs in it to have a mix giving me a piece of hardware.
---> I know there must be a way to link the table in muck better/proper/efficient manner there!
I want to be able to display the PC user/hardware given from a set list and track the hardware via its billings.
So I make a hardware stock table. I link one hardware to many hardware stocked with FK in hardware_stock table and its billing.
I can track failures/breakdowns for each PCs.
there's a failure type table. And I link it to PC via an intersecting table failure event containing FKs to pc_id and failure_type_id.
then I keep comments in a separated table linked to failure_event.
So there it is. I've not tackled the software management yet.
Basically, it's the same than hardware, I have a set list of different software and:
I want to know what's installed on the PCs.
I want to know the billings and license attached to the software stocks.
Thanks for the feedback!

I think there are two options:
Add tables "software", "publisher", "software_category"; add a link table similar to "pc_hardware".
Change 'hardware' to "component", and add 'Software' as a component category.
Option 1 adds more tables, but is more distinct; managing software licences may be easier with this approach.
Option 2 has fewer tables, but is a little harder to understand. Under this approach, a software licence could be a type of 'product warranty' (but that feels messy!)
Just out of interest, have you investigated any open-source or COTS products that do asset management and issue tracking? I have no idea if there are any products available, but you can't be the first person/enterprise to want to do this sort of thing!

Your relationships are a bit much for me to go through.
Here's how you can check your own ERD. Using pencil and index cards, write down the screen fields of your proposed system screens and the report fields of your proposed system reports.
Something like this:
Add a new PC
------------
PC name
PC network
User
...
One index card per screen and report.
If you wish, you can sketch out the appearance of the screens and reports.
Once you've done this for every screen and report you can imagine, you can map your paper screens to your paper database (ERD). If you can easily imagine (code) the select, insert, update, and delete SQL for yor paper database, then your design is good.

Related

Public Library Data over Time

I'm working on creating a better way to view the history of items held at the public library I work at. I have been using a combination of the built in functions of SirsiDynix (our library software), Excel, and Autohotkey to extract, manipulate, and display the data. I am currently stuck on designing a way to view the change in status of an item over time since the system as it is only shows based on the last transaction. For example, if I have the following item:
0000519227318 005.54 WAL 101 EXCEL 2013 TIPS, TRICKS & TIMESAVERS Walkenbach, John, author WE-WH 2013 7 7/13/2013 6/29/2015 35
I can tell you it was created on 7/13/2013, last checked out on 6/29/2015, and has been checked out a total of 7 times. But I am unable to tell you anything about the length of those checkouts, or when they occurred, or if the book had been missing for a year in the middle of that time period.
With Autohotkey and the SirsiDynix Director's Station I have been able to create "daily snapshot" csv files that indicate where an item is every day. However I am having trouble figuring out how to consolidate that information. Originally I was planning to simply add an additional column to the end of the record every day so that after the general item information you would have a series of numbers listing the changing location. The coding I have for AHK to do this is somewhat slow and I'm still working on how I would best display it in Excel regardless. However it occurred to me that there may be a much better way to handle this that could fully automate the process.
So I'm asking whether there are suggestions for either a simple database system to use or an improvement to my current method that could assist me. The queries I plan to do are simply to be able primarily to type in an item number and have a chart display the status of an item, hopefully with something that could also show whenever the total checkouts has increased. I have been looking at Stock Market charts as examples but as many people with those seem to want open,close,hi,low values the responses they get seem beyond what I may need. Additional queries of items with the longest period of time on the shelf relative to total time would be useful although not initially required.
Any help as to what direction I may want to go would be appreciated. I have basic understanding of AutoHotKey and Excel, and I briefly used MySQL several years ago so I have a general feel of how a database can be used.
Not too familiar with your specific software or Autohotkey but for an efficient, secure, and scalable solution, consider any type of relational database management system (RDMS) including server level enterprise systems (some open-source or proprietary): Oracle, SQL Server, DB2, PostgreSQL, MySQL; or file level databases including SQLite and MS Access. One main thing is to try to move out of the concept of flatfile spreadsheets and applications. Excel is simply not a database and should be only used an end-use document for reporting or graphics/analytics using retrieved database content.
With a relational database you can maintain data across normalized related tables linked together by primary and foreign keys. Essentially, you want to build a Library Management System which can comprise of the following tables:
Items -unique list of Items ISBN, Catalog, Title, Author, Publisher, Category (Fiction, Nonfiction, Reference, Media); Primary Key: ItemID (autonumber/auto-increment)
Stock -copies of Items, Condition, Missing/Damaged Status, Cost, and Inventory Quantity; one-to-many relationship with Items table; Primary Key: StockID, Foreign Key: ItemID
Checkouts -infinite history of checkout record including Stock Item, CheckoutDate, CheckinDate, Notes; many-to-many relationship with Stock; Primary Key: CheckoutID, Foreign Key: StockID
Now with this schema, you can better manage each Item throughout its life cycle (new or arrived item, checked out periods, and retired/discard/basement stored) with easy queries for real-time reporting. Additionally, you can use any type of general purpose language (Java, C#, PHP, Python, Perl, VB) which can connect to any aforementioned RDMS to build the interface or tool for this backend system. A host of free consoles are also available including:
PhpMyAdmin (PHP/MySQL)
PgAdmin (PostgreSQL)
SQLite-manager (Firefox addin/SQLite)
Management Studio (SQL Server)
MS Access (Jet/ACE Engine and MS Office) the often misnomer to conflate MS Access as a database when it is actually a GUI that connects by default to a Windows technology, the JET/ACE SQL engine, where this default can be switched out to any aforementioned RDMS
Here, Excel can connect via ODBC/OLEDB VBA for reporting on checked out items status, history, and current shelf stock. Depending on the RDMS, you can even build triggers where as soon as an item is checked out, a record is added to CheckOut table or code it in you tool or scripts. Finally, outputs into txt, csv, xml, pdf reports, email attachments to co-workers, board, etc can be integrated.

Will creating seperate databases in SQL Server give me better performance?

All, I'm a programmer by trade but for this particular project I'm finidng myself being the DBA as well. Here is the scenario I'm faced with:
Web app with anywhere from 400-1000 customers. A customer is a "physical company", each of which has n-number of uers. Each customer (company) has on average 1GB worth of data (total of about 200 million rows). Each company has probably 80% similar data in terms of the type of data stored. The other 20% is custom data that the companies can themselves define (basically custom fields).
I am trying to figure out the best way to scale this on the cheap when you conisder that the customers need pretty good reaction time. For example, customer X might want to grab all records where last name like 'smith' and phone like '555' where as customer Y might want to grab all records where account number equals '1526A'.
Bottom line, performance is key and I'm finding it hard to decide what to index and if that is even going to help me given the fact these guys can basically create their own query through the UI.
My question is, what would you do? Do you think it would be wise to break each customer out into it's own DB? Total DB size at the moment is around 400GB.
It is a complete re-write so I have the fortune of being able to start fresh if needed. Any thoughts, hints would be greatly appreciated.
Bottom line, performance is key and
I'm finding it hard to decide what to
index and if that is even going to
help me given the fact these guys can
basically create their own query
through the UI.
Bottom line, you're ceding your DB performance to the whims of your clients. If they're able to "create their own query", then they're able to "create their own REALLY BAD queries".
So, if you run this in a shared environment (i.e. the same hardware), then customer A's awful table scans can saturate the I/O for everyone else.
If they're on the same database server, then Customer A's scans get to flush all of your other customers data from the data cache.
Basically, the more you "share", the more one customer can impact the operations of other customers. If you give customers the capability to do expensive things, and share much of it, then everyone suffers.
So, the options are a) don't let the customers do silly things or b) keep the customers as separated as practical so that when one does do silly things, the phones don't light up from all of the other customers.
If you don't know "what to index" then you are not offering much control over what the customers can do, and thus the silly thing factor goes way up.
You would probably get quite far by offering several popular, pre-made SQL views that the customers can select from, and then they're limited to simply filtering and possibly ordering the results. Then you optimize around execution of those views.
It's likely that surprisingly few "general" views can cover a large amount of the use cases.
Generic, silly queries can be delegated to a batch process that runs overnight, during off hours, or to a separate machine that doesn't impact transactional performance, such as a nightly snapshot with "everything but todays data" on it. Let them run historic queries against that.
The SO question How to design a multi tenant database has a link to a decent article on the tradeoffs along the spectrum from "shared nothing" to "shared everything". Also, SO has a tag for those kinds of questions; I added it for you.
Creating separate databases on the same server won't help you get better performance. The performance optimisations available to you with multiple databases are just the same as you can achieve with one database.
Separate databases might make sense for administrative reasons - if different backup or availability requirements apply to different customers for example.
It's still probably sensible to build your application so that it can support multiple databases so that you have the option of scaling out over multiple DB servers.
If you have seperate databases the 80% that is the same beciomes almost impossible to keep the same over time. YOu will end up spending far more money for maintenance.
Luckly SQL Server has some options for you. First put the customer sspeicifc information in the same database in a separate schema and the common stuff in a differnt schema(create a common schema and a schema for each client).
Next set up data partitioning by client. This can require the proper hardware to do this effectively.
Now you have one code base for common which will promugate changes to all clients at once and clients are separated for performance using the partitions.

Best way to build a DataMart from multiple external systems?

I'm in the planning stages of building a SQL Server DataMart for mail/email/SMS contact info and history. Each piece of data is located in a different external system. Because of this, email addresses do not have account numbers and SMS phone numbers do not have email addresses, etc. In other words, there isn't a shared primary key. Some data overlaps, but there isn't much I can do except keep the most complete version when duplicates arise.
Is there a best practice for building a DataMart with this data? Would it be an acceptable practice to create a key table with a column for each external key? Then, a unique primary ID can be assigned to tie this to other DataMart tables.
Looking for ideas/suggestions on approaches I may not have yet thought of.
Thanks.
The email address or phone number itself sounds like a suitable business key. Typically a "staging" database is used to load the data from multiple sources and then assign surrogate keys and do other transformations.
Are you familiar with data warehouse methods and design patterns? If you don't have previous knowledge or experience then consider hiring some help. BI / data warehouse projects have a very high failure rate and mistakes can be expensive.
Found more information here:
http://en.wikipedia.org/wiki/Extract,_transform,_load#Dealing_with_keys
Well, with no other information to tie the disparate pieces together, your datamart is going to be pretty rudimentary. You'll be able to get the types of data (sms, email, mail), metrics for each type over time ("this week/month/quarter/year we averaged 42.5 sms texts per day, and 8000 emails per month! w00t!"). With just phone numbers and email addresses, your "other datamarts" will likely have to be phone company names, or internet domains. I guess you could link from that into some sort of geographical information (internet provider locations?), or maybe financial information for the companies. Kind of a blur if you don't already know which direction you want to head.
To be honest, this sounds like someone high-up is having a knee-jerk reaction to the "datamart" buzzword coupled with hearing something about how important communication metrics are, so they sent orders on down the chain to "get us some datamarts to run stats on all our e-mails!"
You need to figure out what it is that you or your employer is expecting to get out of this project, and then figure out if the data you're currently collecting gives you a trail to follow to that information. Right now it sounds like you're doing it backwards ("I have this data, what's it good for?"). It's entirely possible that you don't currently have the data you need, which means you'll need to buy it (who knows if you could) or start collecting it, in which case you won't have nice looking graphs and trend-lines for upper-management to look at for some time... falling right in line with the warning dportas gave you in his second paragraph ;)

Bad real-world database schemas

Our masters thesis project is creating a database schema analyzer. As a foundation to this, we are working on quantifying bad database design.
Our supervisor has tasked us with analyzing a real world schema, of our choosing, such that we can identify some/several design issues. These issues are to be used as a starting point in the schema analyzer.
Finding a good schema is a bit difficult because we do not want a schema which is well designed in all aspects, but a schema that is more "rare to medium".
We have already scheduled the following schemas for analysis: wikimedia, moodle and drupal. Not sure in which category each fit. It is not necessary that the schema is open source.
The database engine used is not important, though we would like to focus on SQL server, Posgresql and Oracle.
For now literature will be deferred, as this task is supposed to give us real world examples which can be used in the thesis. i.e. "Design X is perceived by us as bad design, which our analyzer identifies and suggests improvements to", instead of coming up with contrived examples.
I will update this post when we have some kind of a tool ready.
Check the Dell-dvd-store, you can use it for free.
The Dell DVD Store is an open source
simulation of an online ecommerce site
with implementations in Microsoft SQL
Server, Oracle and MySQL along with
driver programs and web applications
Bill Karwin has written a great book about bad designs: SQL antipatterns
I'm working on a project including a geographical information system. And in my opinion these designs are often "medium" to "rare".
Here are some examples:
1) Geonames.org
You can find the data and the schema here: http://download.geonames.org/export/dump/ (scroll down to the bottom of the page for the schema, it's in plain text on the site !)
It'd be interesting how this DB design performs with such a HUGE amount of data!
2) OpenGeoDB
This one is very popular in german-speaking countries (Germany, Austria, Switzerland) because it's a database containing nearly every city/town/village in the german speaking region with zip-code, name, hierarchy and coordinates.
This one comes with a .sql schema and the table fields are in english, so this shouldn't be a problem.
http://fa-technik.adfc.de/code/opengeodb/
The interesting thing in both examples is how they managed the hierarchy of entities like Country -> State -> County -> City -> Village etc.
PS: Maybe you could judge my DB design too ;) DB Schema of a Role Based Access Control
vBulletin has a really bad database schema.
"we are working on quantifying bad database design."
It seems to me like you are developing a model, or process, or apparatus, that takes a relational schema as input and scores it for quality.
I invite you to ponder the following:
Can a physical schema be "bad" while the logical schema is nonetheless "extremely good" ? Do you intend to distinguish properly between "logical schema" and "physical schema" ? How do you dream to achieve that ?
How do you decide that a certain aspect of physical design is "bad" ? Take for example the absence of some index. If the relvar that that "supposedly desirable index" is to be on, is itself constrained to be a singleton, then what detrimental effects would the absence of that index cause for the system ? If there are no such detrimental effects, then what grounds are there for qualifying the absence of such an index as "bad" ?
How do you decide that a certain aspect of logical design is "bad" ? Choices in logical design are done as a consequence of what the actual requirements are. How can you make any judgment whatsoever about a logical design, without a formalized and machine-readable way to specify what the actual requirements are ?
Wow - you have an ambitious project ahead of you. To determine what is a good database design may be impossible, except for broadly understood principles and guidelines.
Here are a few ideas that come to mind:
I work for a company that does database management for several large retail companies. We have custom databases designed for each of these companies, according to how they intend for us to use the data (for direct mail, email campaigns, etc.), and what kind of analysis and selection parameters they like to use. For example, a company that sells musical equipment in stores and online will want to distinguish between walk-in and online customers, categorize the customers according to the type of items they buy (drums, guitars, microphones, keyboards, recording equipment, amplifiers, etc.), and keep track of how much they spent, and what they bought, over the past 6 months or the past year. They use this information to decide who will receive catalogs in the mail. These mailings are very expensive; maybe one or two dollars per customer, so the company wants to mail the catalogs only to those most likely to buy something. They may have 15 million customers in their database, but only 3 million buy drums, and only 750,000 have purchased anything in the past year.
If you were to analyze the database we created, you would find many "work" tables, that are used for specific selection purposes, and that may not actually be properly designed, according to database design principles. While the "main" tables are efficiently designed and have proper relationships and indexes, these "work" tables would make it appear that the entire database is poorly designed, when in reality, the work tables may just be used a few times, or even just once, and we haven't gone in yet to clear them out or drop them. The work tables far outnumber the main tables in this particular database.
One also has to take into account the volume of the data being managed. A customer base of 10 million may have transaction data numbering 10 to 20 million transactions per week. Or per day. Sometimes, for manageability, this data has to be partitioned into tables by date range, and then a view would be used to select data from the proper sub-table. This is efficient for this huge volume, but it may appear repetitive to an automated analyzer.
Your analyzer would need to be user configurable before the analysis began. Some items must be skipped, while others may be absolutely critical.
Also, how does one analyze stored procedures and user-defined functions, etc? I have seen some really ugly code that works quite efficiently. And, some of the ugliest, most inefficient code was written for one-time use only.
OK, I am out of ideas for the moment. Good luck with your project.
If you can get ahold of it, the project management system Clarity has a horrible database design. I don't know if they have a trial version you can download.

How to generate a diagram of a very large database schema (SQL Server)

I have a very large database I need to diagram. The database is SQL Server 2008 on x64. It is large in that there are hundreds of related tables, each with up to 2000 fields (some are sparse), multiple relationships between tables (often hundreds per table, in fact), multiple schemas... you get the idea.
I tried to use the Database Diagrams feature of SQL Server Management Studio, but it crashed with a Win32Exception: "Not enough storage is available to process this command..."
I tried to use Visio's reverse engineering feature on a different machine to connect in and diagram it, but that's been going for a few hours with no sign of completion.
The scripts to build this giant schema are being by a tool we built for the job. While the tool is doing its job just fine, it's tricky to visualise its output.
I'm after a tool to kick out a diagram of this database so we can do this. Any suggestions?
EDIT:
Just to emphasize, the diagram is indeed not supposed to be used for actual useful reference. It's a client relationship management device to demonstrate the complexity/scale of the system.
I worked at a place that had several hundred tables (near 1k) and no one really knew what was going on in the system, company was growing and hiring a lot. A guy was tasked with doing a diagram, and he auto-magically created a gigantic tiled poster that contained every table with lines connecting various tables (going all over the place). I'm not sure what he used, it was Unix and Oracle years ago (way before Linux and open source). There was no real rhyme or reason to the layout of the the tables in his diagram. He had successfully created a diagram of every table. The "poster" was put on a wall in a common area, and got a few looks, but no one ever really used it, it was unusable, too cluttered, too unorganized. As a result, I used MS-Word to create a single page diagram containing the 20 main tables (it went through a few iterations as I "discovered" new main tables) with lines for each foreign key and each table located in a logical manner. I showed the column name, data type, nullability, PK, and all FKs. I put my diagram up on my wall by my monitor. Eventually everyone wanted a copy of my diagram, including the person that made the "poster". When I left that job they were still giving my diagram to new hires.
I recommend that you work like an explorer, find the key tables and map them as you go, making as many specific diagrams as necessary as you discover the system. Trying to make a gigantic "poster" automatically will not work very well.
Generating an image of any kind for a database of that size simply becomes eye candy that is stuck on a wall that draw's gasps, and honestly serves no real purpose except occasional glances. Why not use a tool like Red Gate's Documentation tool that will serve an actual purpose? Please understand I'm not saying this in a mocking way, but I've been down this road before trying to diagram a huge database, and I succeeded to some degree, but never found a good outlet where it was of some use.
Since you have multiple schemas maybe a good idea is to generate diagrams per schema instead
Use graphviz. Use some SQL statements to generate the digram, then run it through dot.exe to generate a PDF or PNG.
I've used it to generate digrams of data within SQL Server tables. No reason why you can use it for tables too.
http://www.graphviz.org/
There are also java, silverlight, and AJAX utilities for navigating extra large graphs, as PDF is only for one page.
I'd avoid doing the whole thing in a single diagram. As you mentioned, the tools crash, and it's probably not possible to easily comprehend a diagram with hundreds of tables with potentially thousands of records per table. Can you generate diagrams of smaller logical areas with some overlap to other
logical areas?
Alternately, you could try using something like graphviz to parse the DDL statements and then produce a graph. It will probably churn for a while, but I remember seeing in a university poster-sized diagrams with tiny print, that were probably of the same complexity as yours. Good luck!
FWIW, assuming you do want to go ahead with this I've personally found that the visual studio 2010 database modeller does the nicest diagrams I've come across so far - Just import your database as if you were going to use it for Linq2SQL
schemaspy
provides a handy interface to generate interactive diagrams that span multiple schemas using graphviz as a backend. I've never tried it on anything this size though.
IntelliJ (specifically IDEA as just tried with this, but I believe their other IDEs offer this feature https://www.jetbrains.com/) has a built in database client facility, from here you can connect to your database and analyse individual tables, specific combination or table or all your tables by highlighting the desired tables, 'right clicking' and selecting the 'diagram' option. You can save for later reference and also print. I have just tried this on a large DB of 500+ tables and it rendered in seconds, the vector diagram serves as an alternative way to digest database structures visually and the relationships and constraints between certain tables but not recommended for printing.

Resources