Sam's SQL Server blog: Lookup Table

Monday, 11 December 2006

SQL Server Index Creation Guide Line

Index creation guide

Indexes should be considered on all columns that are frequently accessed by the WHERE, ORDER BY, GROUP BY, TOP, and DISTINCT clauses.

Only add indexes if you know that they will be used by the queries run against the table

As a rule of thumb, every table should have at least a clustered index. Generally, but not always, the clustered index should be on a column that monotonically increases--such as an identity column, or some other column where the value is increasing--and is unique. In many cases, the primary key is the ideal column for a clustered index

Static tables (those tables that change very little, or not at all) can be more heavily indexed that dynamic tables (those that are subject to many INSERTS, UPDATES, or DELETES) without negative effect.

If you will be creating an index to speed the retrieval of a single record, you may want to consider making it a non-clustered index, and saving the clustering index (you can only have one) for a more complex query

Don't over index your OLTP tables, as every index you add increases the time it takes to perform INSERTS, UPDATES, and DELETES

Don't add the same index twice on a table.

Whether an index on a foreign key has either high or low selectivity, an index on a foreign key can be used by the Query Optimizer to perform a merge join on the tables in question. A merge join occurs when a row from each table is taken and then they are compared to see if they match the specified join criteria. If the tables being joined have the appropriate indexes (no matter the selectivity), a merge join can be performed, which is generally much faster than a join to a table with a foreign key that does not have an index.

If you have two or more tables that are frequently joined together, then the columns used for the joins should have an appropriate index.

When creating indexes, try to make them unique indexes if at all possible. SQL Server can often search through a unique index faster than a non-unique index because in a unique index, each row is unique, and once the needed record is found, SQL Server doesn't have to look any further.

Ways boost the performance of a query that includes an AND operator in the WHERE clause, consider the following:

Of the search criterions in the WHERE clause, at least one of them should be based on a highly selective column that has an index.
If at least one of the search criterions in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause.
If none of the column in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query.
The Query Optimizer will always perform a table scan or a clustered index scan on a table if the WHERE clause in the query contains an OR operator and if any of the referenced columns in the OR clause are not indexed (or does not have a useful index). Because of this, if you use many queries with OR clauses, you will want to ensure that each referenced column in the WHERE clause has an index.
The Query Optimizer converts the Transact-SQL IN clause to the OR operator when parsing your code. Because of this, keep in mind that if the referenced column in your query doesn't include an index, then the Query Optimizer will perform a table scan or clustered index scan on the table.
Create composite indexes with the most restrictive column first.

Don’t add the index under the following cases

If it is not used by the query optimizer. Use Query Analyzer's "Show Execution Plan" option to see if your queries against a particular table use an index or not. If the table is small, most likely indexes will not be used.
If the column values exhibit low selectivity, often less than 90%-95% for non-clustered indexes.
If the column(s) to be indexed are very wide.
If the column(s) are defined as TEXT, NTEXT, or IMAGE data types.
If the table is rarely queried.

Tuesday, 5 December 2006

Dimension Data Modelling

Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in a dimensional model than in a 3rd normal form model.

To understand dimensional data modeling, let's define some of the terms commonly used in this type of modeling:

Dimension: A category of information. For example, the time dimension.

Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension.

Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year → Quarter → Month → Day.

Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A date column, a store column, and a sales amount column.

Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the Quarter attribute would include a list of all of the quarters available in the data warehouse. Each row (each quarter) may have several fields, one for the unique ID that identifies the quarter, and one or more additional fields that specifies how that particular quarter is represented on a report (for example, first quarter of 2001 may be represented as "Q1 2001" or "2001 Q1").

A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another. Dimensions and hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup tables.

In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and Snowflake Schema.

Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other surrounding objects (dimension lookup tables) like a star. A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table.

Snowflake Schema: The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables.

Whether one uses a star or a snowflake largely depends on personal preference and business needs.