MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_NextPart_01C47965.0F249510" This document is a Single File Web Page, also known as a Web Archive file. If you are seeing this message, your browser or editor doesn't support Web Archive files. Please download a browser that supports Web Archive, such as Microsoft Internet Explorer. ------=_NextPart_01C47965.0F249510 Content-Location: file:///C:/2D195E31/Creating_Database_sop_3nov01.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii" www

 <= /o:p>

 

 

Creating Database Schema Standard Operating Procedure

 

Last Updated: 3 November 2001<= /p>

 

 

 

Overview. 2<= o:p>

Table Creation. <= /span>2<= o:p>

Table Name Guidelines 2<= o:p>

Prefixing. 2<= o:p>

Case and Delimiter 2<= o:p>

Nouns and Adjectives 2<= o:p>

No Plurals 3<= o:p>

Be Verbose But Not Too Verbose. 3<= o:p>

Breaking a Table Into Two. 3<= o:p>

Break a Table Into Three. 3<= o:p>

Lookup Tables 3<= o:p>

Join Tables 4<= o:p>

Key Tables 4<= o:p>

Column Name Guidelines 5<= o:p>

Case and Delimiter 5<= o:p>

Nouns and Adjectives 5<= o:p>

No Plurals 5<= o:p>

Be Verbose But Not Too Verbose. 5<= o:p>

Fields Indicating Number of Items 6<= o:p>

Fields Indicating Date. 6<= o:p>

Fields Indicating Boolean Truths 6<= o:p>

Id Fields 6<= o:p>

Data Type and Size Guidelines 6<= o:p>

Text/Character = 6<= o:p>

Boolean. 7<= o:p>

Money. 7<= o:p>

Numeric. 7<= o:p>

Date/Time. 7<= o:p>

LOBs = 8<= o:p>

Sample Table Structure. 8<= o:p>

Visio. 8<= o:p>

Storing the SQL 8<= o:p>

Defining Relationships and Optimizations 8<= o:p>

Relationships = 8<= o:p>

Indexes = 9<= o:p>

Sybase Indexes 9<= o:p>

Oracle Indexes 9<= o:p>

MySQL Indexes 9<= o:p>

PostGreSQL Indexes 10<= o:p>

Permissions = 10<= o:p>

 

 

 

 

 

 

Overview

 

The core of any software development project at the_co= mpany usually involves the storage and subsequent retrieval of data. Generally th= is is accomplished through an RDBMS (Relational DataBase Management System).

 

Thus, one of the first steps in a project involves com= ing up with a database schema for the project. A schema consists of the tables, datatypes, indexes, relationships, and permissions for the data stored in t= hat project.

 

This SOP’s scope is to cover the steps and gener= al guidelines for creating a database schema at the_company.

 

The basic steps are to first identify the data and relationships, but the second step is to take that data and map it to table names, column names, and datatypes.

 

Once these are determined, relationships and optimizat= ions including indexes, and permissions may be granted on these tables.

 

Table Creation

 

The creation of tables involves a great deal of though= t. Most of the subjective issues resolve around naming so we tackle this first= . To some degree, the naming of fields and tables is similar to the conventions = of naming classes and attributes in Java or Perl.

 

The rest of the table creation involves the more synta= ctical issues such as how to deal with different data types especially amongst different databases. We will note database specific issues elsewhere.

 

Table Name Guidelines

 

Prefixing

 

The first rule of naming tables at the_company is that= they should be prefixed with a reasonable project abbreviation. For example, tra= de finance tables prefixed with “TF”, Banking Single Sign On tables with “BSSO”, and so on.

 

Case and Delimiter

 

All table names are uppercase. Because propercase is n= ot used to distinguish between major words, underscore “_” must be used instead. For example, TF_LETTER_OF_CREDIT.

 

Nouns and Adjectives

 

The table name should at minimum be a noun indicated t= he type of set it is storing. If the noun is not descriptive enough it may als= o be preceeded by one or more adjectives.

 

For example, if there is one transaction table, the ta= ble might be called TRANSACTION. But if there is more than one transaction table because the data model for approved and rejected transaction storage is dissimilar then there would be two tables: APPROVED_TRANSACTION and REJECTED_TRANSACTION.

 

No Plurals

 

A table is a set of something that may contain not onl= y many elements of that set but also 0 or 1. It is not appropriate to name a set a= s a plural. For example, do not call a table ADDRESSES, call it ADDRESS.

 

Be Verbose But Not Too Verbose

 

As long as the table name does not break any hard data= base constraints, you should try not to use abbreviations for table names. The r= ule of thumb is that anything over 10-15 characters is a bit long and anything under is definitely OK.

 

For example, ADDRESS_BOOK is fairly long and could be shortened to ADDRESS but ADDR or another abbreviation is confusingly small. People find whole words easier to remember unless the abbreviation is widely accepted (eg ID short for identification).

 

Breaking a Table Into Two

 

Occasionally you may run into a database constraint th= at forces you to break a table into two 1-1 (one to one relationship) tables. = If this is the case, the second table may carry a name similar to the first one but with the word “_DATA” appended to it.

 

This would indicate that this table contains further d= ata about the previous table. For example, in our trade finance application, the letter of credit application has too much data in it, so we broke it out in= to TF_LC which stores the core LC information that is most often looked up and= the least often looked up extraneous (but still important) data is stored in TF_LC_DATA.

 

Break a Table Into Three

 

Sometimes the breakage is not very even when breaking a table into two independent entities with 1-1 relationships. In this case, t= he other table might still be too large. Therefore, it is still time to break = them up even further. One rule of thumb is to potentially name the subsequent ta= ble with a “DESC_DATA” prefix indicating that all text and blob fie= lds belong in this table while data that is more likely to be directly queried (primitive data types) for efficiency purposes remain the original split out “DATA” table.

 

Lookup Tables

 

Lookup tables are tables that exist to provide a descr= iptive name for some id or abbreviation field. For example, the types of address b= ook entries might consist of P for Personal, B for Business, and PB for Both. In this schema, there should be a lookup table which can map these abbreviatio= ns to their descriptive name.

 

Usually these lookup tables are suffixed with a word (= eg DESCRIPTION) that indicates the type of lookup they are doing such as CATEGORY_DESCRIPTION (a lookup of category descriptions) or PRODUCE_CODE_DESCRIPTION (a lookup of product code descriptions).

 

Join Tables

 

Another type of common table used in database schemas = is a “join” table. In other words, a join table is a table whose pur= pose is to join two other tables together into a M-N (many to many) relationship= .

 

For example, you might have a table of groups whose un= ique identifier is a group id and a table of addresses whose unique id is address_id. But to join them together (associate more than one group to more than one address) we need a new table to join the two together whose sole fields are address_id and group_id (used to join the two together).

 

The name of these tables should consist of the main en= tity followed by the second entity followed by the “MAP” suffix. For example, if ADDRESS and GROUP are joined together via a join table, this ta= ble name would be ADDRESS_GROUP_MAP. Since the table maps the two tables against each other.

 

Key Tables

 

Sybase is one of the few databases that does not have = a nice clean mechanism for autoincrementing fields. While it does have an identity column type that suffices for representing field uniqueness, it frequently jumps in value (eg +50000 at a time) when the database is shutdown and restarted (as part of its burn rate).

 

This is reasonable if the identity field is just used = to determine uniqueness but it is horrible when the id field is combined with a calculation to provide some deterministic transaction id or a value to give back to the user so that they can write down the id for future lookup.

 

Therefore, in Sybase, for schemas that contain tables = with sensitive ids we create a key table to store the key to increment. Then, th= e_company’s Java DataSource can be configured to call a stored procedure which incremen= ts the key value and returns it in a single transaction for applying to an “add” operation on that datasource.

 

The key table is usually called “KEY_TABLE”= ; and is prefixed with the normal project abbreviation. For example, the key table for Trade Finance is called “TF_KEY_TABLE”. It contains two columns: tablename and lastkey. “tablename” stores the table the key is being incremented for and “lastkey” contains the last autoincremented value.

 

Then, a stored procedure is created in the database ca= lled “GET_NEXT_KEY” (TF_GET_NEXT_KEY in the case of Trade Finance). This stored procedure is us= ed to retrieve the next key. The following lists the sample SQL code used in Sybase to create these two tables where X is the project name:

 

use X

go

create table X_KEY_TABLE (

  &nbs= p;    tablename     varchar(25),=

  &nbs= p;    lastkey        &= nbsp;     int

)

go

create procedure X_GET_NEXT_KEY(@TABLE_NAME varchar(25)) as

begin

        begin tran

        declare @LAST_KEY integer

 

 

        update X_KEY_TABLE

        set lastkey =3D lastkey + 1

        where tablename =3D @TABLE_NAME

 

 

        select @LAST_KEY =3D last

 

         key

        from X_KEY_TABLE

        where tablename =3D @TABLE_NAME

 

 

        commit tran

 

 

        select @LAST_KEY

        return @LAST_KEY

end

go

 

Column Name Guidelines

 

Case and Delimiter

 

Field/Column names in tables should be lowercase to distinguish them from table names. Like table names though, whole words sho= uld be separated via the underscore (_) character.

 

Nouns and Adjectives

 

A column should indicate that it is an attribute of the table entity which generally is a noun like “color” or “name”. If there is more than one of a single entity, an adject= ive may be used to prefix the noun to give it more meaning (eg last_name or first_name).

 

No Plurals

 

Ideally, in a relational database, a multiplicity of v= alues belonging to a field should be accomplished via a 1 to many relationship wi= th another table. Thus, very rarely (if at all) should you use plural field na= mes as this would imply multiplicity of values.

 

Be Verbose But Not Too Verbose

 

As with table names, field names should not make use of abbreviations unless those abbreviations are extremely well accepted such as id. Generally, do not use an abbreviation under most circumstances unless t= he field name is above 10-15 characters long already.

 

A commonly accepted abbreviation for number is num. We= do not accept no for an answer. (haha). In other words, please do not use the = word “no” as an abbreviation for number as it is too easily misread = and looks quite odd. Traditionally when “no” has ever been used as = an abbreviation it is clearly followed by a “.” but column names h= ave no such punctual embellishments. So please stick with “num” for abbreviations of the word “number”.

 

Fields Indicating Number of Items

 

Fields indicating the number of items should begin with “number_of” or “num_of”. For example, “num_of_pelicans”. Where a number is representative of an id nu= mber such as “IC number”, then “ic_num” or “ic_num= ber” may use num as a suffix instead.

 

Fields Indicating Date

 

Fields indicating some sort of date or time variable s= hould have the intended data value suffixed. For example, birth_date would intend= to store just a date. But last_mod_datetime would intend to store the date as = well as the time for auditing purposes.

 

This is important because many databases do not have separate distinctions between granularity of time and it is important to ha= ve the table relatively self documenting in this respect lest someone look at a field and attempt to use it in an odd way that would either corrupt the dat= a or result in a faulty date arithmetic calculation.

 

Fields Indicating Boolean Truths

 

Fields indicating Boolean truths should either suffix themselves with the word “flag” or prefix themselves with the w= ord “is”. For example, for the “on” attribute of a lightbulb table, you could use “is_on”.

 

Similarly, if there is a security flag indicating whet= her email will be sent for this record or not, then that field could be referre= d to as “send_email_flag”.

 

Id Fields

 

Id fields should never just be left as the word “id”. Although I realize some people are proponents of this, mo= st find this to be confusing in practice to have all primary keys called “id” and then renamed to something more descriptive when used a= s a foreign key.

 

By maintaining the same name of the id field in all ta= bles regardless of primary of foreign key use, the relationship between the tabl= es is easier to see regardless of lines drawn explicitly linking them up.

 

For example, in the address table, the id should be ca= lled “address_id” not simply “id”. Then, in the address_group_map table, the address_id is still called “address_id” in order to reinforce that this table is linked to= the address table.

 

Data Type and Size Guidelines

 

Text/Character

 

There are usually 3 major types of text that can be st= ored in a database. Free length text, char data and varchar data.

 

The basic rule of thumb is that varchar data is usuall= y used on data that is variable in size but still fairly small such as “last_name”. Char is used when the data is small and usually fi= xed in length such as “category_code” (eg maybe it is a 2 character field). And finally, descriptive text is usually stored in a free form text field.

 

Free length text fields have no limits but should be u= sed sparingly. Most databases are unable to index them so they are inefficient.= In addition, most database APIs make you go through a lot of extra programming= to retrieve or set more than some maximum limit of data (eg 4k). Most databases cannot perform the same operations on free length text fields.

 

Sybase max data size for char and varchar fields is 25= 5 and text fields can be any size.

 

Oracle has a max data size for varchar fields that is = larger (around 1000?). To store large amounts of text, a LONG should be used (long= is not a long integer in Oracle, it is a Long set of text… go figure). M= ore recently Oracle has introduced CLOB (Character Large Object), which is supp= osed to be more efficient, but it also is harder to program so we don’t recommend its use.

 

Boolean

 

Not all databases support a Boolean datatype. In this = case, we recommend choosing the smallest integer possible and using the value 0 f= or false, 1 for true, and NULL to mean NULL (as null should always be). Didn’t your mother ever tell you? NULL MEANS NULL!!

 

Money

 

Fields dealing with monetary values should under almos= t no circumstances ever be stored as a float or double value. These data types merely approximate decimal point values. It is possible for money data to change value if stored in these fields.

 

For Sybase, we recommend storing money in either the m= oney data type or a numeric type with a precision of 2.

 

Numeric

 

If a field is a whole number, use an integer style data type, otherwise use a float unless the decimal precision is vital to mainta= in – in which case try using decimal, numeric, money or some other data = type that maintains a precise value beyond the decimal point.

 

Date/Time

 

If the database you are using supports only date and t= ime, then you must store dates and times inside the larger value. Likewise, if t= hey do not support the larger date and time value but you want to store both, y= ou may need to split the field into two.

 

Timestamps are a different issue. Timestamps are usual= ly not supposed to be represented as true date/time data but rather a unique times= tamp roughly equal to the millisecond that the user last changed the record. One= of the few databases which has a real date based timestamp is MySQL. However, = be wary in MySQL – only one timestamp field can exist in a table. Subseq= uent timestamps beyond the first one will never be edited.

 

LOBs

 

LOBs are Large Objects. Usually they are known as BLOBs (Binary Large Objects) and are designed to store arbitrary file data, often images. Unless necessary, we recommend staying away from LOBs because retri= eving and storing their data is usually very database specific and quite problema= tic in both Java and Perl.

 

Sample Table Structure

 

The following is a sample address book table in Sybase= with data types.

 

Column

Datatype

address_id

numeric (10,0) identity (for auto inc)

name

varchar(35)

phone

varchar(15)

age

integer

can_drive

binary (for Boolean)

notes

text (for free length data)

salary

money

birth_date (suffixed with just date to show we don&#= 8217;t care about the exact time)

datetime

 

Visio

 

Throughout the entire experience of creating tables, y= ou shouldn’t be creating them in SQL. You should first create them in Vi= sio through the DB Schema stencils. Once you have DB Schema stencils, you shoul= d do a RFC (Request for Comment) amongst your project peers (or outside the proj= ect and within localstaff if you are the only project developer).

 

Once everyone has agreed, then you should write the SQ= L.

 

Storing the SQL

 

The SQL (Structured Query Language) for defining a set= of tables should go into the CVS tree for that project under the “sql= 221; directory and the name of the file should end with a “.sql” extension.

 

Defining Relationships and Optimizations<= /h1>

 

Once these various table options are determined, relationships and optimizations including indexes, and permissions may be granted on these tables.

 

Relationships

 

The main relationships most developers usually wish to represent in a database schema are primary/foreign key relationships. Unfortunately, not all databases support declarative referential integrity.=

 

For those that do, you should be careful about using declarative integrity. Some databases are notoriously inefficient at such things (eg Sybase). Oracle seems reasonably fast with referential integrity. Traditionally the reason Sybase did not support it at first (or well) was because triggers were used to emulate the referential integrity and indeed = with a much more sophisticated ruleset actually being possible using triggers.

 

Unfortunately, triggers are highly database dependent.=

 

Indexes

 

You should always consider adding indexes to your tabl= es. At minimum the primary, foreign key relationships should be indexes so that the joins can be done efficiently.

 

Second, you should always consider creating indexes on= the field groups that are commonly queried the most.

 

Some databases implement indexes in different ways and= this may make a difference in which indexes will be created.

 

Sybase Indexes

 

Sybase, for example, supports the concept of clustered versus nonclustered indexes. Clustered indexes are highly efficient because they eliminate one layer of the index lookup by making the leaves of the index… the table itself! Thus, the table is always maintained in the order of the index.

 

The disadvantage with a clustered index, is that Sybase locks records by whole page, and thus using a clustered index on a primary = key of the table can sometimes result in lock contention on the last index page= if your database has many adds. Usually this is not a problem for most databas= es and using a clustered index on the primary key is efficient for joins to th= at table.

 

In Sybase, we tend to create the index such that it us= upper case, with whole words separated by underscore (_) using the column names making up the index followed by the word “IDX”.

 

For example, an index for address_id on an ADDRESS tab= le would be created using the following syntax:

 

create clustered index ADDRESS_ID_IDX on ADDRESS(address_id)

go

 

By default indexes are created uniquely. If you want a nonclustered index, you can specify the word “nonclustered” in place of clustered up above.

 

A sample nonclustered index created on first and last = name of an ADDRESS table would look like the following:

 

create nonclustered index FIRST_NAME_LAST_NAME_I= DX on ADDRESS(first_name, last_name)

go

 

Oracle Indexes

 

Not Applicable Yet.

 

MySQL Indexes

 

Not Applicable Yet.

 

PostGreSQL Indexes

 

Not Applicable Yet.

 

Permissions

 

Permissions in development are frequently left open. R= ather than assign permissions, we usually try to alias the user to the owner of t= he tables.

 

In Sybase, this is accomplished by using sp_addalias t= o add the login to the database as dbo. For example, sp_addalias myloginid, dbo w= ould alias myloginid to the dbo of the database I executed that command inside.<= /p>

 

In production, it is reasonable to close the database = and use specific grant statements to grant the appropriate rights on the application login id to the database tables. Grant statements are fairly cross-database friendly.

 

 

 

END

 =

------=_NextPart_01C47965.0F249510 Content-Location: file:///C:/2D195E31/Creating_Database_sop_3nov01_files/header.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii"





PAGE  <= span style=3D'font-family:Verdana'>

 

PAGE  10

 

------=_NextPart_01C47965.0F249510 Content-Location: file:///C:/2D195E31/Creating_Database_sop_3nov01_files/filelist.xml Content-Transfer-Encoding: quoted-printable Content-Type: text/xml; charset="utf-8" ------=_NextPart_01C47965.0F249510--