Very often questions about the best position a column should be put in within a composite index come out into forums and Oracle discussions. The last question I have contributed to and tried to answer has been raised up in a French Forum. The original poster was wondering whether it is a good idea to place the very often repeated column (contains duplicates) at the leading edge of the index or not.
First of all, in contrast to my usual style of blogging I am not going to provide a SQL model on which I am going to expand my knowledge of index design. However, for those who want to learn and master indexes I would encourage them to read the world expert person in this field, Richard Foote. He has an excellent blog with several articles about almost all what one has to know about indexes and not only the widely used b-tree indexes but on all other types of indexes including bitmap indexes, function based indexes, partitioned indexes, exadata storage indexes etc..
The second reference is as always Jonathan Lewis blog in which you can find several articles about index design, index efficiency and index maintenance. In addition, it is not sufficient to know how to design precise index; you need to know as well how your index will evolve with the delete, insert or update operations their underlined tables will undergo during the lifecycle of the application they participate to its normal functioning.
The third reference is the book Relational Database Index Design and the Optimizers which extends the index design to several databases including DB2, Oracle and SQL Server
I, from time to time, come to read very interesting articles about indexes in this web site that I am following via twitter. It contains valuable index design information which, according to what I have read up to now, is pertinent, correct and back up all what I have learned from Jonathan Lewis, Richard Foote and from my own professional experience.
That’s said, I will post here below few of my answers and articles(and Jonathan Lewis articles) about index design as a small answer to a lot of questions about index design
- On the importance of the leading index columns that should be the ones on which an equality predicate is applied
- Indexing Foreign keys
- Redundant Indexes
- Global or Local Partitioned Index
- Compressing indexes basic part and cost of index compression
I am planning to write several other articles on indexes and I will be completing the above list as far as I will go with this publishing task
My answer to the original poster question about the importance of the number of distinct values property of an index candidate column is that the starting index column decision is not driven by its number of distinct values. It is instead driven by:
- The nature of the query where clause he has to cover
- The nature of the predicate (equality, inequality, etc..) applied on the starting column
- The constant desire to cover with the same index one or a couple of other queries
- The constant desire to cover with the same index a foreign key deadlock threat : sometime just by reversing the columns order we succeed to cover the current query and an existing foreign key
- The constant desire to avoid redundant indexes
And finally comes up the reason for which one has to consider placing the column with the small number of distinct values at the leading edge of the index: Compression. If you start your index with the more often duplicated column you will make a good index compression reducing, efficiently, the size of that index which means it will be very quickly put into the buffer cache and will be kept there more often due to its small size.