Sunday, February 19, 2012

Creating index in a large table

Hello:
We have a table with millions of records. Not long ago, we added a new
column to this large table, and we like to populate this new column with data
and create an index for it.
Would there be any time difference if we create an index (non-clustered)
before we populate it with data, or after we populate it with data? In
another word, is it true that it takes lesser time to create an index in a
column with only null?
Thanks,
Q
Q:
I would hazard a guess that you are better off to populate the new column
with data and then create the non-clustered index rather than the other way
around. (Assumption: you have an existing Clustered Index on the table that
is hopefully on an integer field)
Rationale: When you create a non-clustered index on a table that has a
clustered index you create a B-Tree structure that has the cluster keys.
If you create the index and then populate, you are making the updates work
harder because it has to update the leaf level of the clustered index (where
the actual data is) as well as insert or manipulate the non-clustered index.
As updates occur your index will have page splits and the index pages will
become fragmented.
If you populate and then create the index you will create the index B-Tree
structure all in one shot and it will not be fragmented.
David Lundell
Principal Consultant and Trainer
www.MutuallyBeneficial.com
David@.MutuallyBeneficial.com
"Q" <Q@.discussions.microsoft.com> wrote in message
news:A2B654F6-B13D-40E2-A7DA-852058447BE4@.microsoft.com...
> Hello:
> We have a table with millions of records. Not long ago, we added a new
> column to this large table, and we like to populate this new column with
> data
> and create an index for it.
> Would there be any time difference if we create an index (non-clustered)
> before we populate it with data, or after we populate it with data? In
> another word, is it true that it takes lesser time to create an index in a
> column with only null?
> Thanks,
> Q
|||Hello David:
Thanks for your detail reply. Good to know that it would create more work
if there is already a clustered index and then if we add a new column, a
non-clustered index to the new column, then populate data to this new column.
There is no clustered index in this table. So can I assume there is no
overhead in insert/update, so there will not be fragmentation in split index
page?
Thanks
Q
"David Lundell" wrote:

> Q:
> I would hazard a guess that you are better off to populate the new column
> with data and then create the non-clustered index rather than the other way
> around. (Assumption: you have an existing Clustered Index on the table that
> is hopefully on an integer field)
> Rationale: When you create a non-clustered index on a table that has a
> clustered index you create a B-Tree structure that has the cluster keys.
> If you create the index and then populate, you are making the updates work
> harder because it has to update the leaf level of the clustered index (where
> the actual data is) as well as insert or manipulate the non-clustered index.
> As updates occur your index will have page splits and the index pages will
> become fragmented.
> If you populate and then create the index you will create the index B-Tree
> structure all in one shot and it will not be fragmented.
> --
> David Lundell
> Principal Consultant and Trainer
> www.MutuallyBeneficial.com
> David@.MutuallyBeneficial.com
> "Q" <Q@.discussions.microsoft.com> wrote in message
> news:A2B654F6-B13D-40E2-A7DA-852058447BE4@.microsoft.com...
>
>
|||Create the index *after* updating the column with data. Creating the
index first will add overhead during the update process.
On Tue, 10 Jan 2006 15:34:01 -0800, "Q" <Q@.discussions.microsoft.com>
wrote:

>Hello:
>We have a table with millions of records. Not long ago, we added a new
>column to this large table, and we like to populate this new column with data
>and create an index for it.
>Would there be any time difference if we create an index (non-clustered)
>before we populate it with data, or after we populate it with data? In
>another word, is it true that it takes lesser time to create an index in a
>column with only null?
>Thanks,
>Q
|||Q:
You actually still get the overhead even with a non-clustered index on a
table that does not have a clustered index. Although it is fairly unusual
for heavily used tables to not have a clustered index. It would be wise to
take an overall look at your indexing strategy
David Lundell
Principal Consultant and Trainer
www.MutuallyBeneficial.com
David@.MutuallyBeneficial.com
"Q" <Q@.discussions.microsoft.com> wrote in message
news:056EE00B-AD6E-464E-8598-C2872BC1BDCB@.microsoft.com...[vbcol=seagreen]
> Hello David:
> Thanks for your detail reply. Good to know that it would create more work
> if there is already a clustered index and then if we add a new column, a
> non-clustered index to the new column, then populate data to this new
> column.
> There is no clustered index in this table. So can I assume there is no
> overhead in insert/update, so there will not be fragmentation in split
> index
> page?
> Thanks
> Q
> "David Lundell" wrote:
|||Q wrote:
> Hello:
> We have a table with millions of records. Not long ago, we added a
> new column to this large table, and we like to populate this new
> column with data and create an index for it.
> Would there be any time difference if we create an index
> (non-clustered) before we populate it with data, or after we populate
> it with data? In another word, is it true that it takes lesser time
> to create an index in a column with only null?
> Thanks,
> Q
Definitely update before the index. The update process is going to cause
excessive logging because of the number of rows - which means you should
update in batches (say 100,000 rows at a time if possible). Each update will
require a corresponding index update if th eindex is in place, causing even
more work. Create the index after the column is loaded with data for the
rows in question.
David Gugick
Quest Software
|||Thanks!
"David Gugick" wrote:

> Q wrote:
> Definitely update before the index. The update process is going to cause
> excessive logging because of the number of rows - which means you should
> update in batches (say 100,000 rows at a time if possible). Each update will
> require a corresponding index update if th eindex is in place, causing even
> more work. Create the index after the column is loaded with data for the
> rows in question.
> --
> David Gugick
> Quest Software
>
>
|||Thanks!
"bradsbulkmail@.comcast.net" wrote:

> Create the index *after* updating the column with data. Creating the
> index first will add overhead during the update process.
> On Tue, 10 Jan 2006 15:34:01 -0800, "Q" <Q@.discussions.microsoft.com>
> wrote:
>

No comments:

Post a Comment