Skip to Content

Database Design - A real world example

Posted in

As mentioned in my last post, I have a need to develop a relatively simple issue tracking system. This seemed to be a perfect time to explore the system development process in detail. Today, we'll look at how to "design" a database.

If you do not know how to use SQL to create tables, or do not understand what a relational database is, then these would be good starting points for you. You might be able to follow along with the discussion, but some the of the commands and terminology may cause you some pause. BUT, I'm being very verbose with my write up here just for people in this situation. No, I'm not going to teach you the SQL commands here, but if you are at that stage, you may not have a clear idea how to "design" a solution to a problem. So we'll explore the process I went through in more detail than usual. Experienced developers may just want to skim through the article and check out the SQL commands.

When designing a database (or desinging a software program), it's best to start with a clear description of what you are trying to do. This description lays the foundations for your design. So here's a description of the application we are trying to create:

  1. The system must handle multiple projects
  2. The system control user access to only authorized projects
  3. Each project can have issues associated with it
  4. Each issue will have a state (reported, in progress, complete, etc.)
  5. Each issue may have one or more progress notes associated with it
  6. Each issue will be given an issue type (bug, enhancement, etc.)
  7. States must be configurable (allow additional states, in any order, as needed)
  8. Users may wish to receive email notification when issues meet certain states

Now, all 2 of my faithful readers (Hi Mom!!), may be noticing that this list is taken from my last post, but there are a couple of changes. It is important to understand that the definition of your problem is a dynamic thing. It will change as the you work your way through the solution. In terms of data models, it will not be finalized until the day the project is done. Why? Well, a data model is a reflection of our understanding of a problem, and our understanding is based on our description. But as we start documenting the problem in detail, we begin to see the little things that need to be indicated and/or changed. This process happens at every single step through the development of a software solution. It's always best to spend as much time up front to build a solid plan that foresees as many issues as possible, as this will prevent some serious changes later. But there will always be something that just can't be done as expected, or the planned method turns out to be less efficient, or maybe redundant, or maybe just wrong. The end result is that our definition of the problem and it's sub issues will change over the life of the project. So our data model will also change somewhat.

Really though, the changes needed are usually very minor if some time has been spent in working out the plan. And these changes are typically a direct result of an unforeseen requirement.

In the list above, a clear example of this is the concept of "issue type" When I began designing the sytem, I was only picturing an "issue" as an abstract thought. However after looking at it for a bit, and reflecting on the types of issues my customers throw at me, I realized that I was giving higher priority to isses considered as a bug, than those considered an enhancement. (An enhancement is a change to existing code to meet a NEW need - or put another way, a change to meet a need that was not previously thought about.) So I now had a new requirement for the issue tracking system - an issue could be a bug or an enhancement. And those two types only meet my immediate needs. If anyone else uses this system, then they may have other types they need to track. So let's make the idea of an issue type a variable thing (i.e. we can add more as needed, without rewriting code).

Ok, back to the real stuff. When we look at our system definition (aka our requirements), we have a lot of useful information available to us. First, the organization of the data is more or less laid out. Sure we will store "issues", but each of them must belong to a project. Seeing as our project seems to be the first bit of data we'll need, lets start there.

Our requirements are a little lacking in details. We know we need information for a project, but we don't know what information. If I were doing this for a customer, that would be my next question to them. I would get them to give me enough detail to fill out our requirements so there would be no question what project info we need. But for smaller projects, and especially those we do for ourselves, we don't really need to be THAT formal. (As a rule of thumb for me, my personal projects tend to be informal, but my business projects tend to be very formal and follow the software development life cycle, or SDLC).

In this case I simply jotted down some ideas of what information I needed for a project:


- Project Name
- When the project started
- When the project finished

Sure we could specify more info, but that's the basics of what I need. If I had a possibility of more than one project with the same name but for different customers, then I might need to add more detail here, like a customer name. But my needs are pretty basic, so I'll assume that IF this ever happens, I'll just indicate which customer in the project name. And with that, we have enough information to build our first table.


CREATE TABLE issues_projects (
project_id INT NOT NULL AUTO_INCREMENT,
project_name VARCHAR(200),
start_date DATETIME,
end_date DATETIME,
created_by INT,
created_on DATETIME,

PRIMARY KEY (project_id)
);

Now, I'm including more information here than we've talked about thus far. So I better explain. First, I know I'm building a Drupal module, and the table will be built in Drupal's database (which happens to by MySQL in my case). So, first, I'm using MySQL specific SQL statements (but they're generic enough it shouldn't matter much if someone wants to apply this to another database system). Next I know that some of the table names I'm likely to pick could conflict with other Drupal tables or other modules. So I'm trying to avoid duplicate names by indicating what module the table belongs to, followed by the real name of the table. I'm tentatively calling my module "issues" just because I'm so imaginative. So my tables will all be called "issues_tablename", and the tablename part will change as needed. The names I've chosen are arbitrary and could be anything we wanted really. But it's always best to use names that describe what the table/item is for.

Next, I've introduced a primary key. This will simply be a unique integer ID for every one of our projects. I've specified the AUTO_INCREMENT option, which is MySQL specific, to automagically handle what number to put in here. (Side note: I've seen reference in the Drupal documents that a Drupal specific routine should be used to handle this, so we might need to revisit this table definition at a later date.)

Finally I've included a field to track when a project was created, and who did it. This is not really needed, but it has become habit for me due to some of the projects I've done in the past. The question always comes up "who created that record?", and its nice to have an answer when it does.

If you look at the datatype for each of the field you'll see I have chosen something that makes sense for the field. For example, the project name is text, so I picked a "varchar" datatype. The size of the varchar is arbitrary, but I wanted to have space for some longer names, so I picked a maximum of 200 characters. Fields that should be dates, have a datatype of "datetime", fields that should be numbers are "integers" (if I needed decimal values I would have used "float". The exact data type you use and the size of the fields should be set to something suitable for the field. Other than that one requirement, take your best guess - the nice part of designing is that WE get to choose what the "right" answer is. But you should learn as much as possible regarding databases and your project so that you can make informed decisions.

Now we can look at the next table we'll need - the issues. This is a little more complex. If we review our original definition again we see that we don't have enough info for an issue, but we do have some. We know that an issue can have a state, and an issue type. This is where the complexities come in (well, not tooooo complex, but could be seen as odd if you don't really understand the "relational" part of databases). We know that we have to be able to add new states and issue types as needed, and this suggests an individual table for each. That would allow us to add another state record, or issue type record when we want. And if we use a primary key for these tables to relate a specific record to an issue, we have a nicely "normalized" database.

Another related bit of information is progress comments for an issue. Beacuse we can have more than one comment per issue, we'll make comments a separate table.

So, here's the table definitions for an issue, states, and issue_types:


CREATE TABLE issues_issues (
issue_id INT NOT NULL AUTO_INCREMENT,
project_id INT,
issue_type_id INT,
date_added DATETIME,
date_done DATETIME,
state_id INT,
issue_title varchar(200),
description TEXT,
created_by INT,
created_on DATETIME,

PRIMARY KEY (issue_id)
);

CREATE TABLE issues_notes (
note_id INT NOT NULL AUTO_INCREMENT,
issue_id INT,
note_date DATETIME,
comment TEXT,
initial_state INT,
new_state INT,
created_by INT,
created_on DATETIME,

PRIMARY KEY (note_id)
);

CREATE TABLE issues_issue_types (
issue_type_id INT NOT NULL AUTO_INCREMENT,
issue_type VARCHAR(50),
weight SMALLINT,
created_by INT,
created_on DATETIME,

PRIMARY KEY (user_type_id)
);

/* close flag indicates if the state should mark an issue closed */
CREATE TABLE issues_states (
state_id INT NOT NULL AUTO_INCREMENT,
state VARCHAR(50),
weight SMALLINT,
close_flag SMALLINT,
created_by INT,
created_on DATETIME,

PRIMARY KEY (state_id)
);

You can see that each table follows the same naming standard we defined earlier, has a primary key, and our extra "created" info. The only real new parts here are how we are linking the tables. You'll notice that an issue has a project_id field, which is the Foreign Key to the project table, and then corresponding Foreign Keys for issue_type, and state. A comment is linked to the issue by the issue_id. A comment also has two fields which should be explained more - the "initial_state" and "new_state" fields. I've included these because I'd like to know what the state was when a comment was entered, and if the comment was used to change the state. This is more of a software design issue, but the table has to support that ifno somehow. (I might not implement this fully, but it's a nice thought for now.)

The issue_types and states tables each have a "weight" field. This is expected to be a number used to sort the records. The phrase "weight" was chosen simply because it's used in Drupal for just this purpose, but I could have easily called this "sort order", "ordinal", or something else. If we think about it a little, we do not always want to sort things alphabetically. For instance our states could be "reported, accpeted, in progress, completed, cancelled" and we might want them sorted in this order. So wewould set the "weight" field for each record to force this order. More to the point, the SQL statement will sort by weight, then alphabetically.

So far, we have tables to meet almost all of our requirements. There are two outstanding still - access control to the projects, and notifications.

Access control is simply saying "Bob can see project X". If we were building this as a standalone application, then we would need to build a user table. But Drupal already has one for us, and each user is identified by a "uid" value, which is the primary key for the user. So we need a nice simple table that links a user ID to a project:


CREATE TABLE issues_user_projects (
uid INT,
project_id INT,
created_by INT,
created_on DATETIME,

PRIMARY KEY (uid, project_id)
);

The only real difference here is that our primary key is the uid field AND the project_id field, which are both foreign keys to their respective tables. The combination of a uid and project_id will ALWAYS be unique. If Bob has access to three projects, then we'll see combinations like "3,5", "3,6", and "3,7" - assuming Bob's uid is 3. If we add Bob to project_id 6 again, we'll find he already has permission. This is what is known as a "composite" key. In practice though we just need to worry that the uid/project_id combination has to be unique.

So with this table we can now get a list of all the projects a particular user id is allowed to view, and then from the project_id's we can get the project details and the related issues. As long as we refer to this table when dealing with projects and issues, we can safely restrict access to our application.

And finally we have the notification requirement. We want to be able to say that when an issue meets state X, an email will be sent to all users who have access to the project, and want to know the state has been met. That sounds more complex than it really is:


CREATE TABLE issues_notifications (
notice_id INT NOT NULL AUTO_INCREMENT,
uid INT,
state_id INT,
created_by INT,
created_on DATETIME,

PRIMARY KEY (notice_id)
);

Here we are simply matching a user id to a state id. We can assume the software will know which users have access to the project from a separate routine, and then it can just check those user ids against this table to see if an email should be sent for the newly set state of the issue. This table simply supports some of the software processing.

And there you have it. One data model that can be implemented for this project. The design process, which may seem so mysterious to some, is really just a way to document how we are thinking about the task at hand and the application of some background knowledge, in this case it is relational databases. Can we make this more efficient? Probably. Do we care right now? Not really - we should focus more on building something that works AND that we can understand. Once we have that, then we can look at optimizations if needed. Also, to make this more efficient, we'd likely need to change the way we are thinking about the project, which means we'll probably need to change the data model and software. And the benefits may be irrelative anyways.

The process we followed here is the same thought exercise I go through everytime I start a new project and/or task. The process applies just as well to software design as database design. Sure the details are different, but how I arrive at a "design" is very similar. After you've been doing this for a while you can get to the point where you don't even think about it. These tables were designed in my head in about 30 seconds, but without all the supporting detail. It's the relationship of the information that is more important. When you've been doing it A LOT, it's done without thinking, and people think you're guru when you are just applying the same routine over and over and over and over and....

For those new to design, I truly hope this helps highlight the process.

See the complete SQL file.
(Note: should be a .SQL file, but restrictions with WordPress prevent the upload of a .SQL file.)