AfricaDotNet
www.AfricaDotNet.com Continues the .NET Conversation in Africa

SQL Server partitioning

rated by 0 users
This post has 1 Reply | 1 Follower

Top 10 Contributor
Male
Posts 22
Ben Chege Posted: 05-16-2009 1:53 AM

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4

While the feature has been available on sql server 2005, I tend to think that most people don’t use it since when people upgrade databases they don’t look for new features in the new version that the new version has to offer.

One of these features is partitioning tables, basically the feature enables you to split your table into partitions for faster retrieval, a whole article is here (http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx) but an overview wouldn’t hurt.

The idea is to spit the table based on values of a certain field maybe a date or a month if you have too much data coming into the table per day or month. The field is called the Partitioning Key, you need to create a Partitioning function that basically states where the data should be moved to in ranges. Then you need to create a Partition Scheme to direct the partitions to specific filegroups. When you define a partition scheme, you must make sure to name a filegroup for every partition, even if multiple partitions will reside on the same filegroup. If all partitions are to reside on the same file group you can create the scheme on the PRIMARY filegroup. Then finally create a table that uses that Partition Scheme.

I will try and do the above for a trouble ticketing system for a GSM company that has issues left right and center.

FileGroups:

If you want to place a partitioned table on multiple files for better I/O balancing, you will need to create at least one filegroup. Filegroups can consist of one or more files, and each partition must map to a filegroup. A single filegroup can be used for multiple partitions but for better data management, such as for more granular backup control, you should design your partitioned tables so that only related or logically grouped data resides on the same filegroup. Using ALTER DATABASE, you can add a logical filegroup name, and then add files. To create a filegroup named 2009Q1 to 2009Q4 for the TroubleTickets database, use ALTER DATABASE in the following way:

If you need to add files groups to your database run the alter database commands as seen below.

ALTER DATABASE TroubleTickets ADD FILEGROUP [2009Q1];

ALTER DATABASE TroubleTickets ADD FILEGROUP [2009Q2];

ALTER DATABASE TroubleTickets ADD FILEGROUP [2009Q3];

ALTER DATABASE TroubleTickets ADD FILEGROUP [2009Q4];

ALTER DATABASE TroubleTickets

ADD FILE

  (NAME = N'2009Q1',

  FILENAME = N'C:\TroubleTickets\2009Q1.ndf',

  SIZE = 2MB,

  MAXSIZE = 100MB,

  FILEGROWTH = 2MB)

TO FILEGROUP [2009Q1]

 

ALTER DATABASE TroubleTickets

ADD FILE

  (NAME = N'2009Q2',

  FILENAME = N'C:\TroubleTickets\2009Q2.ndf',

  SIZE = 5MB,

  MAXSIZE = 100MB,

  FILEGROWTH = 5MB)

TO FILEGROUP [2009Q2]

 

ALTER DATABASE TroubleTickets

ADD FILE

  (NAME = N'2009Q4',

  FILENAME = N'C:\TroubleTickets\2009Q4.ndf',

  SIZE = 5MB,

  MAXSIZE = 100MB,

  FILEGROWTH = 5MB)

TO FILEGROUP [2009Q4]


To create a partition function


CREATE PARTITION FUNCTION TroubleTicketRangePFN(datetime)

AS

RANGE LEFT FOR VALUES  ('20090101', '20090201', '20090301',

'20090401', '20090501', '20090601', '20090701',

'20090801', '20090901','20091001', '20091101', '20091201');


In the CREATE PARTITION FUNCTION statement, you can specify the value criteria on which the table will be partitioned. Specifying the LEFT argument will include the value as the upper boundary of the first partition. Specifying the RIGHT argument will include the first value as the lower boundary of the second partition.

note As
per the SQL Server documentation: "All data types are valid for use as partitioning columns, except text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined data types"

Create the Partition Scheme

Once you have created a partition function, you must associate it with a partition scheme to direct the partitions to specific filegroups. When you define a partition scheme, you must make sure to name a filegroup for every partition, even if multiple partitions will reside on the same filegroup. For the range partition created previously (TroubleTicketRangePFN), there are 12 partitions; the last, and empty, partition will be created in the PRIMARY filegroup. There is no need for a special location for this partition because it will never contain data.

CREATE PARTITION SCHEME TroubleTicketRangePS AS

PARTITION TroubleTicketRangePFN

TO ([2009Q1],

 [2009Q1], [2009Q1], [2009Q2], [2009Q2],

[2009Q2], [2009Q3], [2009Q3], [2009Q3], [2009Q3],

[2009Q4], [2009Q4], [2009Q4],[PRIMARY]);

Create a table using the scheme


CREATE TABLE [dbo].[TroubleTicket] 

(

   [TroubleTicketID] [bigint] NOT NULL,

   [AssighedEmployeeID] [int] NULL,

   [TroubleTicketSummary] VARCHAR(255) NOT NULL,

   [CreationDate] [datetime] NOT NULL,

   [CreatedBy] [int] NULL

)

ON TroubleTicketRangePS(CreationDate)

Then run a bunch of INSERT statements to add to the table.

INSERT INTO [TroubleTicket] (

[TroubleTicketID],

[AssighedEmployeeID] ,

[TroubleTicketSummary],

[CreationDate] ,

[CreatedBy])

VALUES (1,1,'TEST','2009-01-16 00:00:00.000',1)

 

INSERT INTO [TroubleTicket] (

[TroubleTicketID],

[AssighedEmployeeID] ,

[TroubleTicketSummary],

[CreationDate] ,

[CreatedBy])

VALUES (1,1,'TEST','2009-04-16 00:00:00.000',1)

 

INSERT INTO [TroubleTicket] (

[TroubleTicketID],

[AssighedEmployeeID] ,

[TroubleTicketSummary],

[CreationDate] ,

[CreatedBy])

VALUES (1,1,'TEST','2009-07-16 00:00:00.000',1)

 

INSERT INTO [TroubleTicket] (

[TroubleTicketID],

[AssighedEmployeeID] ,

[TroubleTicketSummary],

[CreationDate] ,

[CreatedBy])

VALUES (1,1,'TEST','2009-11-16 00:00:00.000',1)

To test what partition a certain value will go to use the below statement.

SELECT $PARTITION. TroubleTicketRangePFN ('2009-05-16 00:00:00.000') ;

Or to list all the values in a certain partition use

SELECT * FROM [TroubleTicket]

WHERE $PARTITION.TroubleTicketRangePFN(CreationDate) = 2 -- where 2 is the partition'

 

To list what partition is in which filegroup

select ds.name AS [Filegroup Name], ds.type, destination_id AS [partition number], dds.partition_scheme_id, ps.name as [partition scheme]

from sys.data_spaces ds

join sys.destination_data_spaces dds

on (ds.data_space_id = dds.data_space_id)

join sys.partition_schemes ps

on (ps.data_space_id = dds.partition_scheme_id)

order by ds.name, ps.name ASC

 

Top 10 Contributor
Male
Posts 25

This month's edition of SQL Magazine also makes an interesting read

 

here it is for y'all

Normal 0 false false false MicrosoftInternetExplorer4

Manage large tables by dividing them into

discreet, non-overlapping sections

Fact tables tend to grow very large, very fast. Sometimes, a fact table can become so large that it can be difficult to manage, and SQL queries can be adversely affected by the sheer size of the data set involved. However, you can horizontally partition a fact table to make it easier to work with. A horizontally partitioned table is one in which the rows are divided into discreet, non-overlapping sections. Each section in a horizontally partitioned table is defined by a range of values, such as by date, geographic area, or customers’ last names. (Note: You might consider storing each section on a different hard

disk to enhance query performance.) Although it isn’t difficult to horizontally partition a table, it does require

some advance planning because you’re dealing with large amounts of data. Let’s explore when and why

you’d want to horizontally partition a fact table, and how to create a partition function, a partition scheme,

and a partitioned table using SQL Server 2005’s built-in functionality.

Why Partition a Fact Table?

Large tables (i.e., tables with hundreds of millions of rows) can be difficult to manage because of their

size and the amount of time it takes to do anything with them (e.g., rebuild an index). In a transactional

database, the associative tables (i.e., those tables that involve the many to many—M:N—relationship) are

often the tables with the most rows in the database. In dimensional modeling, a fact table is the equivalent

of an associative table. Like an associative table in a transactional database, a fact table often has many

more rows than its related dimensions, perhaps even as many as (# of rows in dimension 1) x (# of rows in

dimension 2) … x (# of rows in dimension n) rows. Partitioning breaks these monster tables into manageable

chunks. If your maintenance time windows are shrinking or the amount of data to be processed is growing, you can partition the table and perform tasks, such as backup and restore operations or use the Database Contingency Checker (DBCC) to update table statistics, by partition instead of for the entire table. SQL Server 2005 treats the many sections of a partitioned table as a single logical entity, and the multiple partitions appear as a single table to end users. The following are some reasons to horizontally

partition a table:

• You can better control where each partition is placed in storage and leverage multiple read/write heads for fast query resolution.

• You can back up and restore by partition, indexes can be rebuilt and reorganized by partition, and the indexes themselves can be partitioned.

• You can direct queries that include a WHERE clause that contains either the partitioning column or an indexed column to the appropriate partition

for resolution.

• You can reduce lock escalations and lock-management overhead because locking is limited to

partitions.

• You can merge or split partitions fairly easily if multiple partitions are in the same file group.

So which tables are the best candidates for horizontal partitioning? Very large tables, tables that you expect to grow very large in the near future, and tables that can be intuitively partitioned based on their business value (e.g.,by fiscal year). These tables must include a column whose values are NOT NULL and that can be used to divide the rows into discreet, non-overlapping sections such as a column containing sales dates.

If your database contains a large table against which queries and updates aren’t performing the way

you think they should, consider testing how partitioning might affect query performance in your environment.

SQL Server 2005 is partition-aware, meaning that if slow-running queries include a WHERE clause

that contains the partitioning column or the indexed column (and the index is also partitioned), only the relevant partition is accessed for query resolution. This functionality can significantly help performance.

 

Creating a Partition Function

To partition a table, you need to use a function that’s composed of a partitioning column and a set of  boundaries. To createthe partition function for the SALES fact table you’d run the command

 

CREATE PARTITION FUNCTION

MyPartitionFunctionLeft

(datetime)

AS RANGE LEFT

FOR VALUES (‘1/01/2003’,

‘1/01/2005’, ‘1/01/2007)

MyPartitionFunctionLeft is the name of the partitioning function, (datetime) is the data type of the partitioning column, and RANGE LEFT stipulates how to divide up the data values that are bound by the FOR VALUES dates.     

There’s two ways to set the boundaries: RANGE LEFT or RANGE RIGHT. The RANGE LEFT clause divides the data from the lowest value to the highest value (i.e., in ascending order). The RANGE RIGHT clause

divides the data from the highest value to the lowest value (i.e., in descending order). The partitioning column is often a datetime data type such as the Date_of_Event column (shown in Figure 1).

Separating datetime records into non-overlapping groups is straightforward. For example, if your business rules and known operational queries indicate that partitioning the table on the date of a sales event is reasonable, then you could partition the data into two-year groupings, as I did in the previous partition function command. Partitioning RANGE LEFT divides the data into the value ranges. If the date of a sales event was June 23, 2004, you’d find that record in partition 2 (P2). If you want to create the partition function with the RANGE RIGHT clause, you’d run the command

 

CREATE PARTITION FUNCTION

MyPartitionFunction datetime

AS RANGE RIGHT

FOR VALUES (‘1/01/2003’,

‘1/01/2005’, ‘1/01/2007)

 

Partioning RANGE RIGHT divides the data into the valve ranges shown in Figure 3. I recommend, for the

sake of consistency and for the ease of querying, that you choose one range declaration (i.e., RANGE LEFT

or RANGE RIGHT) and stick with it throughout all the partitioned tables that you create in your environment.

Each range of values in a partition is restricted by boundaries that are specified in the FOR VALUES clause. Note that if you’re using datetime data types for boundary values and your company has an office in

Europe, you’ll have to decide on an international standard for datetime, so that it’s uniform across your company. SQL Server assumes that us_english is the default language for the session, so if that’s not the case, you’ll want to create a user-defined function (UDF) that will convert various date formats into us_english, and reference that UDF in the FOR VALUES clause. You don’t have to use literals in the FOR VALUES clause; you can reference variables, functions, and UDFs.

Creating a Partition Scheme

Now that you’ve created a partition function, you need to create a partition scheme. The partition scheme

maps partitions to various file groups, as shown in the following command:

CREATE PARTITION SCHEME

MyPartitionScheme

AS MyPartitionFunction

TO (MyFilegroup1, MyFilegroup2,

MyFilegroup3, MyFilegroup4,

MyFilegroup5)

MyPartitionScheme is the name of the partitioning scheme, and MyPartitionFunction refers to the partition

function. This command maps the boundary values into partitions that are then assigned to one or more file groups. Data rows with Date_of_Event datetime values prior to 1/01/03 are assigned to MyFilegroup1.

Rows with datetime values greater than or equal to 1/01/03 and prior to 1/01/05 are assigned to

MyFilegroup2. Rows with datetime values greater than or equal to 1/01/05 and prior to 1/01/07 are assigned

to MyFilegroup3. All other rows with datetime values greater than or equal to 1/01/07 are assigned to

MyFilegroup4. For each set of boundary values (remember, boundary values are in the FOR VALUES clause of the partition function statement), there will be (# of boundary values + 1) partitions. The previous

CREATE PARTITION SCHEME statement includes three boundary values and four partitions. No matter

whether the partitions are created RANGE RIGHT or RANGE LEFT, there will always be (# of boundary

values + 1) partitions, up to 1000 partitions per table. So why are there five file groups instead of four in this

example? (Remember, the partition function statement had only three boundary values.) The fifth file group is the optional “next-used” file group. Let’s look at how this file group is used and how important it is in the

partition scheme. In the CREATE PARTITION FUNCTION example, the last boundary value is 1/01/2007, so records with a Date_of_Event value greater than or equal to the 1/01/2007 will be stored in partition 4 (P4). When 1/01/2009 rolls around, you’ll have to create a new partition to maintain the storage scheme

that you’ve established. If your original CREATE PARTITION SCHEME statement didn’t include a

next-used file group, you wouldn’t be able to split P4 into P4, which will hold data from 2007 through the

end of 2008, and partition 5 (P5), which will hold data from 1/01/2009 and beyond. If your partitioning plan

involves periodically creating new partitions to hold new data, as in this example, make sure you include the

next-used file group in your CREATE PARTITION SCHEME statement. You don’t have to assign one partition to one file group, as I did in Figure 4. Instead, you can map multiple partitions to a single file group or you can map all the partitions to one file group. However, you can’t map one partition to multiple file groups. Creating the partition scheme might be the most important step in the partitioning process. In the

future, you might want to combine the data from two adjacent partitions into one partition, add a boundary

value to an existing partition, or move data from a populated partition into an empty partition. To perform

these operations, you’ll want to do some advance planning and create the partition scheme so that it will support these activities. SQL Server 2005 Books Online (BOL) provides some good planning resources,

including “Planning Guidelines for Partitioned Tables and Indexes”

 (msdn2.microsoft.com/en-us/library/ ms180767.aspx).

 

Creating a Partitioned Table

Creating a partitioned table isn’t all that different from creating a regular table; you just have to reference the

name of the partition scheme in the ON clause. To do so, run the command

 

CREATE TABLE SALESFact

(SalesFact_KEY bigint identity

(1, 1) primary not clustered NOT

NULL,

all the other columns in the

table, ending up with the

partitioning column,

Date_of_Event datetime

NOT NULL)

ON MyPartitionScheme (Date_of_Event)

 

By specifying the name of a partition scheme, you’re indicating that this table is a partitioned table. Obviously, the partition scheme and the partition function have to exist in the database before you can create the table.

You can combine the data from multiple partitions into one partition. However, you can only merge two

adjacent partitions at a time, so if you’re trying to “unpartition” a table that has multiple partitions, you’ll

have to repeat this step many times. To merge two partitions, run the command

ALTER PARTITION FUNCTION

MyPartitionFunction()

MERGE RANGE (‘1/01/2003’)

This command will merge the partitions as follows: partition 1 (P1) will merge into P2, meaning that P2 will

then contain all rows with a Date_of_Event value prior to 1/01/05. Internally (i.e., in the sys.partitions system

table), the partitions will be renumbered, starting with one (not zero). P1 and P2 will become P1, P3 will

become P2, and P4 will become P3. I recommend sketching out the DDL before you merge any partitions

because you could potentially cripple your operations for sustained periods of time if you’re not careful

when you perform these merges. In fact, Eric Hanson, Microsoft’s lead program manager for query processing on the SQL Server Relational Engine Team, recommends performing splits and merges only on empty partitions, even if that means you have to temporarily empty a partition.

 

Easily Manage Monster Tables

SQL Server 2005’s ability to horizontally partition a table into non-overlapping sections and place

each section on a separate partition makes it easy to manage large fact tables. Because the SQL Server 2005 Enterprise and Developer Editions are partitionaware, properly-written queries will access only the

rows within the relevant partition, thus running much faster than if they had to traverse the contents of the

entire table.

 

Page 1 of 1 (2 items) | RSS
(c) AfricaDotNet
Powered by Community Server (Non-Commercial Edition), by Telligent Systems