How to CREATE INDEX on large table in MySQL
Official documentation: CREATE INDEX
Performance Considerations
Creating indexes on large tables can be resource-intensive and potentially disruptive. Without proper planning, these operations can cause extended downtime, lock tables, or consume excessive server resources. Many organizations require approval for index operations on large tables. You can enforce approval processes or automated schema reviews via Bytebase.
-
Table Locking: Traditional index creation locks the entire table, preventing other operations until completed.
-
Resource Consumption: Building indexes on large tables requires significant CPU, memory, and disk I/O.
-
Transaction Log Growth: Index creation generates substantial transaction logs, potentially filling storage.
-
Replication Impact: On replicated environments, index operations must propagate to all replicas.
Understanding Large Table Challenges
Defining "large" varies by environment, but tables with millions of rows or several GB in size typically require special consideration for index operations. The main challenges are:
- Operation duration (potentially hours)
- Table locking causing application downtime
- Server resource consumption affecting other workloads
- Replication lag on replicated environments
Methods for Creating Indexes on Large Tables
Using Online DDL
MySQL 5.6+ supports online DDL operations that allow concurrent DML while creating indexes:
-- Create index with ALGORITHM=INPLACE and LOCK=NONE
ALTER TABLE large_table
ADD INDEX idx_column (column_name)
ALGORITHM=INPLACE, LOCK=NONE;
This approach:
- Minimizes table locking
- Allows concurrent reads and writes
- Works for most index types in InnoDB
Limitations:
- Not all index operations support ALGORITHM=INPLACE
- Still consumes significant resources
- May fail on very large tables if the operation times out
Using pt-online-schema-change
For extremely large tables, Percona Toolkit's pt-online-schema-change
provides a reliable solution:
pt-online-schema-change --alter="ADD INDEX idx_column (column_name)" \
--host=localhost --user=username --password=password \
D=database,t=large_table --execute
This approach:
- Creates a new table with the desired structure
- Copies data in small batches
- Maintains triggers to capture ongoing changes
- Performs an atomic table swap when complete
Limitations:
- Requires double the disk space temporarily
- Adds overhead from triggers
- Doesn't work well with foreign keys unless handled carefully
Using GitHub's gh-ost
gh-ost is another tool specifically designed for online schema changes:
gh-ost --user="user" --password="password" --host=hostname \
--database="db" --table="large_table" \
--alter="ADD INDEX idx_column (column_name)" \
--execute
This approach:
- Uses binary log streaming instead of triggers
- Creates minimal locking
- Provides detailed progress reporting
- Is generally safer for very large tables
Using FORCE INDEX for Temporary Relief
If you can't immediately create an index but need query performance, consider using a temporary table:
-- Create indexed temporary table with subset of data
CREATE TEMPORARY TABLE tmp_table AS
SELECT * FROM large_table WHERE some_condition
LIMIT 1000000;
CREATE INDEX idx_column ON tmp_table (column_name);
-- Query the temporary table
SELECT * FROM tmp_table WHERE column_name = 'value';
Monitoring Index Creation Progress
For Online DDL Operations
-- Check progress in performance_schema
SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED,
ROUND(WORK_COMPLETED/WORK_ESTIMATED*100, 2) AS "% Complete"
FROM performance_schema.events_stages_current
WHERE EVENT_NAME LIKE 'stage/innodb/alter%';
For MySQL 8.0+ (Performance Schema)
-- Get detailed progress information
SELECT * FROM performance_schema.events_stages_current
WHERE EVENT_NAME LIKE 'stage/innodb/alter%';
General Progress Monitoring
-- Monitor through process list
SELECT * FROM information_schema.processlist
WHERE info LIKE 'ALTER TABLE%';
-- Check for lock waits
SELECT * FROM sys.innodb_lock_waits;
Common Errors and Solutions
See MySQL Error Reference for errors you may encounter. Here are some most common ones:
Error 1206: The total number of locks exceeds the lock table size
-- Increase innodb_buffer_pool_size (requires restart)
SET GLOBAL innodb_buffer_pool_size = 8589934592; -- 8GB
-- Or perform the operation in smaller batches using external tools
Error 1114: The table is full
-- Check available disk space
df -h
-- Consider using external tool like pt-online-schema-change
-- which uses less temporary space
-- Increase tablespace:
ALTER TABLESPACE ts_name ADD DATAFILE 'file_name.ibd';
Error 1205: Lock wait timeout exceeded
-- Increase lock wait timeout for this session
SET SESSION innodb_lock_wait_timeout = 3600; -- 1 hour
-- Use less locking approach
ALTER TABLE large_table
ADD INDEX idx_column (column_name)
ALGORITHM=INPLACE, LOCK=NONE;
Error 3032: Got error 28 from storage engine
-- This indicates disk space issues
-- Free up disk space or use external storage
Techniques
Use Partial Indexes
-- Index only part of a string column
CREATE INDEX idx_large_text ON large_table (large_text_column(20));
-- Functional index on subset of data
CREATE INDEX idx_partial ON large_table ((CASE WHEN status='active' THEN id END));
Staged Index Creation for Extremely Large Tables
For tables with hundreds of millions of rows, consider a staged approach:
-
First create a partial index on recent/active data:
CREATE INDEX idx_partial ON large_table (column_name) WHERE created_at > DATE_SUB(NOW(), INTERVAL 30 DAY);
-
Create additional indexes to cover older data in stages, during separate maintenance windows.
Using Covering Indexes for Large Tables
When querying large tables, covering indexes can dramatically improve performance:
-- Create index that includes all columns used in the query
CREATE INDEX idx_covering ON large_table (search_column, col1, col2, col3);
-- Query can now be served entirely from the index
SELECT col1, col2, col3 FROM large_table WHERE search_column = 'value';
This is especially beneficial for large tables as it reduces the need to access the main table data.