While the feature has been available on sql server 2005, I tend to think that most people don’t use it since when people upgrade databases they don’t look for new features in the new version that the new version has to offer.
One of these features is partitioning tables, basically the feature enables you to split your table into partitions for faster retrieval, a whole article is here (http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx) but an overview wouldn’t hurt.
The idea is to spit the table based on values of a certain field maybe a date or a month if you have too much data coming into the table per day or month. The field is called the Partitioning Key, you need to create a Partitioning function that basically states where the data should be moved to in ranges. Then you need to create a Partition Scheme to direct the partitions to specific filegroups. When you define a partition scheme, you must make sure to name a filegroup for every partition, even if multiple partitions will reside on the same filegroup. If all partitions are to reside on the same file group you can create the scheme on the PRIMARY filegroup. Then finally create a table that uses that Partition Scheme.
I will try and do the above for a trouble ticketing system for a GSM company that has issues left right and center.
FileGroups:
If you want to place a partitioned table on multiple files for better I/O balancing, you will need to create at least one filegroup. Filegroups can consist of one or more files, and each partition must map to a filegroup. A single filegroup can be used for multiple partitions but for better data management, such as for more granular backup control, you should design your partitioned tables so that only related or logically grouped data resides on the same filegroup. Using ALTER DATABASE, you can add a logical filegroup name, and then add files. To create a filegroup named 2009Q1 to 2009Q4 for the TroubleTickets database, use ALTER DATABASE in the following way:
If you need to add files groups to your database run the alter database commands as seen below.
ALTER DATABASE TroubleTickets ADD FILEGROUP [2009Q1];
ALTER DATABASE TroubleTickets ADD FILEGROUP [2009Q2];
ALTER DATABASE TroubleTickets ADD FILEGROUP [2009Q3];
ALTER DATABASE TroubleTickets ADD FILEGROUP [2009Q4]; ALTER DATABASE TroubleTickets
ADD FILE
(NAME = N'2009Q1',
FILENAME = N'C:\TroubleTickets\2009Q1.ndf',
SIZE = 2MB,
MAXSIZE = 100MB,
FILEGROWTH = 2MB)
TO FILEGROUP [2009Q1]
ALTER DATABASE TroubleTickets
(NAME = N'2009Q2',
FILENAME = N'C:\TroubleTickets\2009Q2.ndf',
SIZE = 5MB,
FILEGROWTH = 5MB)
TO FILEGROUP [2009Q2]
(NAME = N'2009Q4',
FILENAME = N'C:\TroubleTickets\2009Q4.ndf',
TO FILEGROUP [2009Q4]
To create a partition function
CREATE PARTITION FUNCTION TroubleTicketRangePFN(datetime)
AS
RANGE LEFT FOR VALUES ('20090101', '20090201', '20090301',
'20090401', '20090501', '20090601', '20090701',
'20090801', '20090901','20091001', '20091101', '20091201');
In the CREATE PARTITION FUNCTION statement, you can specify the value criteria on which the table will be partitioned. Specifying the LEFT argument will include the value as the upper boundary of the first partition. Specifying the RIGHT argument will include the first value as the lower boundary of the second partition. note As per the SQL Server documentation: "All data types are valid for use as partitioning columns, except text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined data types"
Create the Partition Scheme
Once you have created a partition function, you must associate it with a partition scheme to direct the partitions to specific filegroups. When you define a partition scheme, you must make sure to name a filegroup for every partition, even if multiple partitions will reside on the same filegroup. For the range partition created previously (TroubleTicketRangePFN), there are 12 partitions; the last, and empty, partition will be created in the PRIMARY filegroup. There is no need for a special location for this partition because it will never contain data.
CREATE PARTITION SCHEME TroubleTicketRangePS AS
PARTITION TroubleTicketRangePFN
TO ([2009Q1],
[2009Q1], [2009Q1], [2009Q2], [2009Q2],
[2009Q2], [2009Q3], [2009Q3], [2009Q3], [2009Q3],
[2009Q4], [2009Q4], [2009Q4],[PRIMARY]); Create a table using the scheme
CREATE TABLE [dbo].[TroubleTicket]
(
[TroubleTicketID] [bigint] NOT NULL,
[AssighedEmployeeID] [int] NULL,
[TroubleTicketSummary] VARCHAR(255) NOT NULL,
[CreationDate] [datetime] NOT NULL,
[CreatedBy] [int] NULL
)
ON TroubleTicketRangePS(CreationDate)
Then run a bunch of INSERT statements to add to the table.
INSERT INTO [TroubleTicket] (
[TroubleTicketID],
[AssighedEmployeeID] ,
[TroubleTicketSummary],
[CreationDate] ,
[CreatedBy])
VALUES (1,1,'TEST','2009-01-16 00:00:00.000',1)
VALUES (1,1,'TEST','2009-04-16 00:00:00.000',1)
VALUES (1,1,'TEST','2009-07-16 00:00:00.000',1)
VALUES (1,1,'TEST','2009-11-16 00:00:00.000',1)
To test what partition a certain value will go to use the below statement.
SELECT $PARTITION. TroubleTicketRangePFN ('2009-05-16 00:00:00.000') ;
Or to list all the values in a certain partition use
SELECT * FROM [TroubleTicket]
WHERE $PARTITION.TroubleTicketRangePFN(CreationDate) = 2 -- where 2 is the partition'
To list what partition is in which filegroup
select ds.name AS [Filegroup Name], ds.type, destination_id AS [partition number], dds.partition_scheme_id, ps.name as [partition scheme]
from sys.data_spaces ds
join sys.destination_data_spaces dds
on (ds.data_space_id = dds.data_space_id)
join sys.partition_schemes ps
on (ps.data_space_id = dds.partition_scheme_id)
order by ds.name, ps.name ASC
This month's edition of SQL Magazine also makes an interesting read
here it is for y'all
Normal 0 false false false MicrosoftInternetExplorer4
Manage large tables by dividing them into
discreet, non-overlapping sections
Fact tables tend to grow very large, very fast. Sometimes, a fact table can become so large that it can be difficult to manage, and SQL queries can be adversely affected by the sheer size of the data set involved. However, you can horizontally partition a fact table to make it easier to work with. A horizontally partitioned table is one in which the rows are divided into discreet, non-overlapping sections. Each section in a horizontally partitioned table is defined by a range of values, such as by date, geographic area, or customers’ last names. (Note: You might consider storing each section on a different hard
disk to enhance query performance.) Although it isn’t difficult to horizontally partition a table, it does require
some advance planning because you’re dealing with large amounts of data. Let’s explore when and why
you’d want to horizontally partition a fact table, and how to create a partition function, a partition scheme,
and a partitioned table using SQL Server 2005’s built-in functionality.
Why Partition a Fact Table?
Large tables (i.e., tables with hundreds of millions of rows) can be difficult to manage because of their
size and the amount of time it takes to do anything with them (e.g., rebuild an index). In a transactional
database, the associative tables (i.e., those tables that involve the many to many—M:N—relationship) are
often the tables with the most rows in the database. In dimensional modeling, a fact table is the equivalent
of an associative table. Like an associative table in a transactional database, a fact table often has many
more rows than its related dimensions, perhaps even as many as (# of rows in dimension 1) x (# of rows in
dimension 2) … x (# of rows in dimension n) rows. Partitioning breaks these monster tables into manageable
chunks. If your maintenance time windows are shrinking or the amount of data to be processed is growing, you can partition the table and perform tasks, such as backup and restore operations or use the Database Contingency Checker (DBCC) to update table statistics, by partition instead of for the entire table. SQL Server 2005 treats the many sections of a partitioned table as a single logical entity, and the multiple partitions appear as a single table to end users. The following are some reasons to horizontally
partition a table:
• You can better control where each partition is placed in storage and leverage multiple read/write heads for fast query resolution.
• You can back up and restore by partition, indexes can be rebuilt and reorganized by partition, and the indexes themselves can be partitioned.
• You can direct queries that include a WHERE clause that contains either the partitioning column or an indexed column to the appropriate partition
for resolution.
• You can reduce lock escalations and lock-management overhead because locking is limited to
partitions.
• You can merge or split partitions fairly easily if multiple partitions are in the same file group.
So which tables are the best candidates for horizontal partitioning? Very large tables, tables that you expect to grow very large in the near future, and tables that can be intuitively partitioned based on their business value (e.g.,by fiscal year). These tables must include a column whose values are NOT NULL and that can be used to divide the rows into discreet, non-overlapping sections such as a column containing sales dates.
If your database contains a large table against which queries and updates aren’t performing the way
you think they should, consider testing how partitioning might affect query performance in your environment.
SQL Server 2005 is partition-aware, meaning that if slow-running queries include a WHERE clause
that contains the partitioning column or the indexed column (and the index is also partitioned), only the relevant partition is accessed for query resolution. This functionality can significantly help performance.
Creating a Partition Function
To partition a table, you need to use a function that’s composed of a partitioning column and a set of boundaries. To createthe partition function for the SALES fact table you’d run the command
CREATE PARTITION FUNCTION
MyPartitionFunctionLeft
(datetime)
AS RANGE LEFT
FOR VALUES (‘1/01/2003’,
‘1/01/2005’, ‘1/01/2007)
MyPartitionFunctionLeft is the name of the partitioning function, (datetime) is the data type of the partitioning column, and RANGE LEFT stipulates how to divide up the data values that are bound by the FOR VALUES dates.
There’s two ways to set the boundaries: RANGE LEFT or RANGE RIGHT. The RANGE LEFT clause divides the data from the lowest value to the highest value (i.e., in ascending order). The RANGE RIGHT clause
divides the data from the highest value to the lowest value (i.e., in descending order). The partitioning column is often a datetime data type such as the Date_of_Event column (shown in Figure 1).
Separating datetime records into non-overlapping groups is straightforward. For example, if your business rules and known operational queries indicate that partitioning the table on the date of a sales event is reasonable, then you could partition the data into two-year groupings, as I did in the previous partition function command. Partitioning RANGE LEFT divides the data into the value ranges. If the date of a sales event was June 23, 2004, you’d find that record in partition 2 (P2). If you want to create the partition function with the RANGE RIGHT clause, you’d run the command
MyPartitionFunction datetime
AS RANGE RIGHT
Partioning RANGE RIGHT divides the data into the valve ranges shown in Figure 3. I recommend, for the
sake of consistency and for the ease of querying, that you choose one range declaration (i.e., RANGE LEFT
or RANGE RIGHT) and stick with it throughout all the partitioned tables that you create in your environment.
Each range of values in a partition is restricted by boundaries that are specified in the FOR VALUES clause. Note that if you’re using datetime data types for boundary values and your company has an office in
Europe, you’ll have to decide on an international standard for datetime, so that it’s uniform across your company. SQL Server assumes that us_english is the default language for the session, so if that’s not the case, you’ll want to create a user-defined function (UDF) that will convert various date formats into us_english, and reference that UDF in the FOR VALUES clause. You don’t have to use literals in the FOR VALUES clause; you can reference variables, functions, and UDFs.
Creating a Partition Scheme
Now that you’ve created a partition function, you need to create a partition scheme. The partition scheme
maps partitions to various file groups, as shown in the following command:
CREATE PARTITION SCHEME
MyPartitionScheme
AS MyPartitionFunction
TO (MyFilegroup1, MyFilegroup2,
MyFilegroup3, MyFilegroup4,
MyFilegroup5)
MyPartitionScheme is the name of the partitioning scheme, and MyPartitionFunction refers to the partition
function. This command maps the boundary values into partitions that are then assigned to one or more file groups. Data rows with Date_of_Event datetime values prior to 1/01/03 are assigned to MyFilegroup1.
Rows with datetime values greater than or equal to 1/01/03 and prior to 1/01/05 are assigned to
MyFilegroup2. Rows with datetime values greater than or equal to 1/01/05 and prior to 1/01/07 are assigned
to MyFilegroup3. All other rows with datetime values greater than or equal to 1/01/07 are assigned to
MyFilegroup4. For each set of boundary values (remember, boundary values are in the FOR VALUES clause of the partition function statement), there will be (# of boundary values + 1) partitions. The previous
CREATE PARTITION SCHEME statement includes three boundary values and four partitions. No matter
whether the partitions are created RANGE RIGHT or RANGE LEFT, there will always be (# of boundary
values + 1) partitions, up to 1000 partitions per table. So why are there five file groups instead of four in this
example? (Remember, the partition function statement had only three boundary values.) The fifth file group is the optional “next-used” file group. Let’s look at how this file group is used and how important it is in the
partition scheme. In the CREATE PARTITION FUNCTION example, the last boundary value is 1/01/2007, so records with a Date_of_Event value greater than or equal to the 1/01/2007 will be stored in partition 4 (P4). When 1/01/2009 rolls around, you’ll have to create a new partition to maintain the storage scheme
that you’ve established. If your original CREATE PARTITION SCHEME statement didn’t include a
next-used file group, you wouldn’t be able to split P4 into P4, which will hold data from 2007 through the
end of 2008, and partition 5 (P5), which will hold data from 1/01/2009 and beyond. If your partitioning plan
involves periodically creating new partitions to hold new data, as in this example, make sure you include the
next-used file group in your CREATE PARTITION SCHEME statement. You don’t have to assign one partition to one file group, as I did in Figure 4. Instead, you can map multiple partitions to a single file group or you can map all the partitions to one file group. However, you can’t map one partition to multiple file groups. Creating the partition scheme might be the most important step in the partitioning process. In the
future, you might want to combine the data from two adjacent partitions into one partition, add a boundary
value to an existing partition, or move data from a populated partition into an empty partition. To perform
these operations, you’ll want to do some advance planning and create the partition scheme so that it will support these activities. SQL Server 2005 Books Online (BOL) provides some good planning resources,
including “Planning Guidelines for Partitioned Tables and Indexes”
(msdn2.microsoft.com/en-us/library/ ms180767.aspx).
Creating a Partitioned Table
Creating a partitioned table isn’t all that different from creating a regular table; you just have to reference the
name of the partition scheme in the ON clause. To do so, run the command
CREATE TABLE SALESFact
(SalesFact_KEY bigint identity
(1, 1) primary not clustered NOT
NULL,
…all the other columns in the
table, ending up with the
partitioning column…,
Date_of_Event datetime
NOT NULL)
ON MyPartitionScheme (Date_of_Event)
By specifying the name of a partition scheme, you’re indicating that this table is a partitioned table. Obviously, the partition scheme and the partition function have to exist in the database before you can create the table.
You can combine the data from multiple partitions into one partition. However, you can only merge two
adjacent partitions at a time, so if you’re trying to “unpartition” a table that has multiple partitions, you’ll
have to repeat this step many times. To merge two partitions, run the command
ALTER PARTITION FUNCTION
MyPartitionFunction()
MERGE RANGE (‘1/01/2003’)
This command will merge the partitions as follows: partition 1 (P1) will merge into P2, meaning that P2 will
then contain all rows with a Date_of_Event value prior to 1/01/05. Internally (i.e., in the sys.partitions system
table), the partitions will be renumbered, starting with one (not zero). P1 and P2 will become P1, P3 will
become P2, and P4 will become P3. I recommend sketching out the DDL before you merge any partitions
because you could potentially cripple your operations for sustained periods of time if you’re not careful
when you perform these merges. In fact, Eric Hanson, Microsoft’s lead program manager for query processing on the SQL Server Relational Engine Team, recommends performing splits and merges only on empty partitions, even if that means you have to temporarily empty a partition.
Easily Manage Monster Tables
SQL Server 2005’s ability to horizontally partition a table into non-overlapping sections and place
each section on a separate partition makes it easy to manage large fact tables. Because the SQL Server 2005 Enterprise and Developer Editions are partitionaware, properly-written queries will access only the
rows within the relevant partition, thus running much faster than if they had to traverse the contents of the
entire table.