|
1
|
|
|
2
|
- Overview performance objectives of OLTP
- Design, Techniques and Best practices
- Transactions
- Concurrency
- Index maintenance issues
- SQL Server 2005 OLTP Implementation findings
- Identifying Performance issues
- Resource utilization
- Optimization
- Estimation and query plan selection
- Plan re-use & Recompilation
- Useful counters
- OLTP Performance Blueprint
- Summary
|
|
3
|
- OLTP performance objectives
- Fast transactions
- Set operations preferable over Cursors
- Indexes allow granular data access and locking
- Maximizing CPU resources
- High plan re-use
- Low re-compilation
- Maximizing IO resources
- Minimize joins
- Fast transaction log (writelog)
- Small IOs for Data (io_completion)
- See SEAS06PT (Performance Tuning) deck
- Exploit L2 Cache – SQL Server 2005 loves L2/L3 cache
|
|
4
|
- Overview performance objectives of OLTP
- Design, Techniques and Best practices
- Transactions
- Concurrency
- Index maintenance issues
- SQL Server 2005 OLTP implementation findings
- Identifying Performance issues
- Resource utilization
- Optimization
- Estimation and query plan selection
- Plan re-use & Recompilation
- Useful counters
- OLTP Performance Blueprint
- Summary
|
|
5
|
- Application & Database Design and T-SQL have major impact on
performance
- ‘Typical’ performance tuning plays smaller role
- Performance monitoring can point out some deficiencies (or opportunities
for improvement!) throughout application life cycle.
|
|
6
|
- Consistency and Concurrency
- Consider new Isolation levels RCSI and SI to improve SQL Server
2000 behavior of readers blocking writers and writers
blocking readers.
- Short transactions are key for high concurrency in OLTP. SQL Server
will err towards row level locking and improve concurrency
- Performance
- SQL Server is great at inserts, very good at updates and not as good at
deletes
- Can use partitioning to avoid deletes
- SQL Server 2005 zero costs plans can help
|
|
7
|
- Avoid long (or wide) clustered index key if table has nonclustered (N/C)
indexes
- Leaf of Nonclustered index uses the clustered index key (primary key)
to locate the data row
- Since a wide clustered index key increases size of N/C, (covered)
nonclustered range scans results in more IO
- Avoid high volume Clustered index seeks & RID lookups (N/C)
- Clustered index benefits
- high volume lookups (avoids RID lookups)
- Range scans – access to entire data row
- Non-clustered index benefits
- Query covering
- Can be used to avoids sorts
|
|
8
|
- Missing indexes
- Sys.dm_db_missing_index_group_stats
- Sys.dm_db_missing_index_groups
- Sys.dm_db_missing_index_details
- Sys.dm_exec_query_plan(plan_handle) - Look for <MissingIndexes>
- Unused indexes
- Sys.dm_db_index_usage_stats
- Index Access, Blocks, Contention e.g. waits
- Sys.dm_db_index_operational_stats()
- Sys.dm_db_index_physical_stats()
|
|
9
|
|
|
10
|
- Overview performance objectives of OLTP
- Design, Techniques and Best practices
- Transactions
- Concurrency
- Index maintenance issues
- SQL Server 2005 OLTP Implementation findings
- Identifying Performance issues
- Resource utilization
- Optimization
- Estimation and query plan selection
- Plan re-use & Recompilation
- Useful counters
- OLTP Performance Blueprint
- Summary
|
|
11
|
- Procedure Cache Growth
- Zero costs plans can fill up cache quickly with high volume dynamic
transactions
- SQL 2005 plan cache changes resulting in
- Out of Memory errors
- Worse performance on 64-bit vs. 32-bit platform
- Worse performance on SQL 2005 vs. SQL 2000
- Resolution/Workaround
- SP2 addresses 4 main areas:
- We evict plans much faster
- We cap the procedure cache much more aggressively on high-end machines
leaving more memory for data pages
- Certain zero cost plans do not get cached at all
|
|
12
|
- Parameter Sniffing/Forced
parameterization
- Many workloads generate generic SQL based on application end-user input
- Parameterized queries are generally used to aviod excessive compilation
against all database platforms
- First invocation with non-representive parameter values can cause major
problems
- Resolution/Workaround
- Run the workload with most popular plan on startup
|
|
13
|
- SQL Server 2005 can use more CPU
- In-place application upgrades from SS2K to SS2K5 (same hardware) often
result in 20-30% more CPU utilization
- Especially noticeable with batch or serialized operations
- The problem is primarily attributed to:
- Higher query plan compilation costs
- More code/larger working set
- Resolution/Workaround
- In some cases enabling ‘Forced Parameterization’ helps
- True mostly with older Hardware with small L2/L3 cache
|
|
14
|
- Data manipulation of partition table can be more costly than
non-partition table
- Partitioned tables keep a separate rowmodctr for every partition
- This costs a lot of CPU cycles for checking the value, which increases
with each additional partition
- Resolution/Workaround
- Turn auto statistics off
- Add statistics updates to maintenance schedule
|
|
15
|
- Excessive files in a database
- Files are opened serially after recovery restart
Adversely impacts large system availability
- Backup of a database containing lots of files/filegroups can be slower
- Resolution/Workaround
- More files for TempDB and Log is ok
- Reasonable amount of files for data and indexes (depending on size of
file group)
|
|
16
|
- Overview performance objectives of OLTP
- Design, Techniques and Best practices
- Transactions
- Concurrency
- Database design
- Normalization, Denormalization,
- Index maintenance issues
- SQL Server OLTP implementation findings
- Identifying Performance issues
- Resource utilization
- Optimization
- Estimation and query plan selection
- Plan re-use & Recompilation
- Useful counters
- OLTP Performance Blueprint
- Summary
|
|
17
|
- Queuing
- Multiple types of queues (memory, CPU, IO)
- Resource limitations
- Bad configuration
- Bad Queries & Design
- Badly written, poorly designed
- Poor indexing
- Not relevant to workload or lack of
- Inappropriate optimizer plans
- Too many round trips from client
|
|
18
|
- Database shared resources
- Database performance is limited by maximum Transaction Log throughput,
only ONE possible transaction log per database!
- Can be resolved by
- adding multiple spindles
- Increasing number of databases to provide multiple transaction logs
- Server shared resources
- TEMPDB
- Tempdb in memory vs. less memory for buffer cache
- Memory (64-bit) flat (see SEAS06 SQLOS & VLDB)
- Memory (32-bit)
- Only data cache can live in 32-bit AWE
- Proc cache, locks, user connections, sorting restricted to lower 2-3GB
of address space
- Can be resolved by partitioning over multiple instances
- Machine/node shared resources
- CPU and networking
- Can be resolved by partitioning over multiple servers
|
|
19
|
- Database scalability is limited by the maximum throughput of the
transaction log
- Instance scalability is limited by shared “process” level resources
- Server scalability is limited by shared “server”/”machine” level
resources
- CPU (incl. L1 & L2 cache)
- Network bandwidth
|
|
20
|
- Determine I/O pattern
- Writes
- Transaction Log (~100% sequential)
- Lazy Writer (random)
- Read
- Establish disk I/O baseline or SLA outside SQL Server, using:
- SQLIO or IOMeter (Intel, public domain)
- Special cases:
- Transaction log
- 1 Tempdb file for each cpu
- Max Parallel BCP load = 1 BCP / CPU
- Into SQL Server 2005 partitioned tables
|
|
21
|
- I/O bottlenecks are typically easy to find
- Be very careful with the transaction log
- Beyond 12 to 15 spindles doesn’t buy much
- Keep on separate physical disks for recovery
- Make RAID 10
- Beware of write cost on RAID5:
- In RAID 5 each write has to logically read old data + old parity (to
compute parity) and write new data and new parity
- Each RAID5 write = 2 READS + 2 WRITES !
- However: Disk guys work real hard to optimize this
- Recent bulk load tests showed >50% degradation comparing RAID 0+1
vs. RAID 5
|
|
22
|
- Disk subsystem based on I/O throughput required, not size of DB
- E.g. 1TB data / 72GB per drive = 14 drives.
- Will 14 drives provide sufficient IO throughput?
- Recommend more smaller drives
- Random (OLTP) vs. sequential (Reporting) IO/sec
- Cache on controller – tuned for % read or write
- Consider all workloads
- OLTP (typically random IOs)
- Batch (could be random or sequential depending on the type of work
done)
- Use SQLIO to measure your max throughput rating for your subsystem.
|
|
23
|
- Profile the log disk
- How many writes / second can your disk sustain?
- Keep the log disk purely for the log
- Keeps the disk heads writing sequentially minimizing seeks
- Beware of unprotected write back caches
- If power fails, you could lose the entire database – not just the last
couple of transactions!
- Check with your SAN / Disk controller vendor
|
|
24
|
- Blocking between sessions can occur due to a combination of incompatible
locks and waits on resources
- Tools
- Use Profiler block process report and other tools to find blocking
processes
- DMVs
- New blocking solutions
- Snapshot Isolation - Row Versioning
- See SEAS06PT for locking discussion
|
|
25
|
- DMF sys.dm_db_index_operational_stats() identifies the contention points
- Row locks counts
- Row lock waits counts
- Total wait time for blocks
- Compute blocking percentage and average wait times
- See SEAS06PT Indexes & Row Lock Waits.sql
- Sys.dm_os_waiting_tasks
|
|
26
|
- Sp_block_info – lists real time blocks
- Trace – for historical analysis
- Capture long blocks using the Trace Event “Block Process Report”
- Sp_configure “blocked process threshold”,15 (seconds)
- This is covered in SEAS06PT
- If blocking is still an issue, Consider row versioning to minimize read
/ write contention
|
|
27
|
- Row versioning-based isolation levels
- Always read a committed value (as compared with dirty reads)
- Reads do not acquire shared (S) locks
- improve concurrency by eliminating blocks for read / write operations.
- Tempdb overhead
- Stores versions of previously committed row data
- RCSI
- Advantage: NO APPLICATION
CHANGES !
- Transaction Isolation Level Read Committed & Read_Committed_Snapshot ON database
option
- Statement level read consistency
- Transaction Isolation Level Snapshot
- Transaction level read consistency
- Database Snapshot
|
|
28
|
- Tempdb usage is much more common in SS2005
- Tempdb management must be a configuration priority for DBAs
- The following uses Tempdb w/ SS2005
|
|
29
|
- On line index:
- 2x-3x size of index – Sort size, temp index and rollback
- Versioning:
- [Size of Version Store] = 2 *
[Version store data generated per minute] *
[Longest running time (minutes) of your transaction] * number
of concurrent transactions/users
- Note: Version store data
generated per minute and version store size are now perfmon parameters
- Recommendation repeated: Be sure to tune TempDB for proper sizing as
well as performance
|
|
30
|
- Reduces sgam contention
- Still needed in 2005 if you have DDL statements for Create Table and
Create Index in stored procedures that are called many times (high
volume).
|
|
31
|
- select
- sum(user_object_reserved_page_count)*8 as user_objects_kb,
- sum(internal_object_reserved_page_count)*8 as internal_objects_kb,
- sum(version_store_reserved_page_count)*8 as version_store_kb,
- sum(unallocated_extent_page_count)*8 as freespace_kb
- from sys.dm_db_file_space_usage
- where database_id = 2
|
|
32
|
- SELECT t1.session_id,
- (t1.internal_objects_alloc_page_count + task_alloc) as allocated,
- (t1.internal_objects_dealloc_page_count + task_dealloc) as deallocated
- , t3.sql_handle, t3.statement_start_offset
- , t3.statement_end_offset, t3.plan_handle
- from sys.dm_db_session_space_usage as t1,
- sys.dm_exec_requests t3,
- (select session_id,
-
sum(internal_objects_alloc_page_count) as task_alloc,
- sum
(internal_objects_dealloc_page_count) as task_dealloc
- from sys.dm_db_task_space_usage
group by session_id) as t2
- where t1.session_id = t2.session_id and t1.session_id >50
- and t1.database_id = 2 ---
tempdb is database_id=2
- and t1.session_id = t3.session_id
- order by allocated DESC
|
|
33
|
- Database snapshots do consume resources on your server.
- Tested – TPC-C workload had 15% performance loss with single Database
snapshot
- The more database snapshots, the more performance will be impacted.
|
|
34
|
- Veritas (formerly Precise) InDepth for SQL Server
- Excellent tool for identifying
- Resource bottlenecks
- Resources consumed by statement
- Waits by statement
- Performance history
- Quest Software
- Great tools for monitoring
- Partition management
- Backup with compression (Litespeed)
|
|
35
|
- Server has two distinct and optimized code paths
- Goal is to utilize the correct code path!
- Language event
- Every statement not being a (stored) procedure
- extra parsing required to figure out what is in the string
- Adhoc query plans for string (in addition to Stored Proc plans)
- Generic code which executes procedures via a language event, for
example OSQL, Query Analyzer etc.
- RPC event
- Stored procedure invocations using {call} syntax
- Increases performance
- eliminates parameter processing and statement parsing
|
|
36
|
- OLTP Benchmark lessons
- Big performance gains from best practices
- Use efficient row length and data types
- Every byte counts, use correct types
- Match packet size and batch size
- Perf of ‘Bind’ on client proportional to batch size
- For large batches, avoid ODBC Parameter binding with ?
- ODBC {Call Proc} better than execute proc syntax
- {call dbo.qi ('M01', 'M01.0407040000000002')}
- exec dbo.qi @v1='M01', @v2='M01.0407040000000002' –adds ADHOC query
plans due to SQL string parsing
- Net gain using above - 7x
|
|
37
|
- You always fetch all results and all result sets!
- Un-fetched results and result sets can cause concurrency issues on the
server
- Un-fetched results and result sets will cause an attention signal to be
send to the server to cancel the pending stream
- SET NOCOUNT ON
- Avoid unnecessary round trips of sending empty result sets for INSERT,
UPDATE and DELETE statements
|
|
38
|
- Master..Sys.dm_exec_cached_plans
- Procedure or batch name
- Set options for plans
- Ref counts, Use counts
- Compiled plan
- Single copy (serial and parallel)
- Re-entrant and re-usable
- Statement level recompilation
- Executable plan
- Data structure for user context, not re-entrant
- Look for plan reuse: usecounts > 1
- Plan re-use of
- Procs, Triggers, Views
- Defaults, Check constraints, rules
- adhoc SQL, sp_executesql
|
|
39
|
- SQL Batch requests/sec
- Compare to initial SQL Compilations/sec
- SQL Compilations/sec
- Includes initial compiles AND re-compiles
- Eliminate re-compilations to get initial compiles
- Look for identical SQL statements with low usecounts in
Sys.dm_exec_cached_plans
- See SEAS06PT:Worst plan re-use by statement.sql
- SQL Re-compilations/sec
- Statement Level Recompiles
- Sys.dm_exec_query_stats (plan_generation_num)
- when incremented indicates recompilation
- Check profiler for sp:recompile event to identify SQL statement.
- http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
|
|
40
|
- Waiting to run
- Runnable queue – pure CPU waits
- CPU pressure measured by signal waits
- Plan compilation & requests
- Perfmon: SQLServer:SQL Statistics
- Batch requests / sec { >1000’s/sec server is busy}
- SQL Compilations / sec {>10s/sec could be problem}
- SQL Recompilations / sec {OLTP should avoid high recomps}
- Ratio of compiles / requests is important
- Compiles – recompiles = initial compiles
- Plan re-use = (Batch requests – initial compiles) / Batch requests
- (compared with batch requests, low initial compiles indicates plan
re-use)
- Recompile reasons:
- Change in schema state – schema altered, etc.
- Previously parallelized plan needs to run serially
- Statistics recomputed
- Rows changed threshold – sys.sysindexes.rowmodctr
|
|
41
|
- CPU used for plan determination
- OLTP characterized by high numbers of identical small transactions
- Plan re-use desirable
- See usecounts in Sys.dm_exec_cached_plans
- Stored procedure estimates are based on initial parameter values
- Re-use is generally good for OLTP,
- re-use can be bad when when results sets can significantly vary in size.
|
|
42
|
- Plan selection is based on estimates
- Overestimation
- Favors fixed cost (hash) strategy
- Extreme cases can improve
- with LOOP JOIN hint
- Execute P1 with recompile
- Underestimation
- Favors variable cost (e.g. nested loops) strategy
- Extreme cases can improve with HASH option
- Set Statistics Profile on
- Shows estimates vs. actuals
- Look for huge differences (examples)
- OverEstimates are 100x actuals
- UnderEstimates are 1% actuals
|
|
43
|
- Plan re-use (or lack of)
- Compare batch requests to SQL compiles/sec
- IO
- Recompilation
- Cache hit, insert, miss, remove
- Index usage (or lack of)
- Object access
|
|
44
|
- The Profiler events that track cache management include:
- SP:CacheMiss (event ID 34 in Profiler)
- SP:CacheInsert (event ID 35 in Profiler)
- SP:CacheRemove (event ID 36 in Profiler)
- SP:Recompile (event ID 37 in Profiler)
- SP:CacheHit (event ID 38 in Profiler)
- SP:Starting lists stored procedure execution
- SP:StmtStarting will show corresponding SQL statement
- Example: sequence is
- SP:StmtStarting
- SP:CacheMiss (no plan found)
- SP:CacheInsert (plan created)
- Watch out: Heavy profiler use will affect performance !
- Add Eventsubclass data column to display recompilation reason
|
|
45
|
- Plan determination is CPU Intensive
- Recomp good if benefit of new plan > CPU cost
- Profiler
- Lists recomp events and statements
- Data column for reason: EventSubClass
- locks on system tables
- Re-compiling stored procedure plans serialize other users during high
concurrency
- places lock on single compile plan
- Re-compilation based on
- Rows changed thresholds (rowmodctr)
- DDL placement, schema changes
- Code practice & temp tables (P1 & P2)
|
|
46
|
|
|
47
|
- SS Databases: Log Flush Wait time
- SS Databases: Log Flush Waits/sec
- SS General Statistics: User Connections
- SS Latches: Average Latch Wait Time(ms)
- SS Latches: Latch Waits/sec
- SS Latches: Total Latch Wait Time (ms)
- SS Locks: Average Wait Time(ms)
- SS Locks: Lock requests/sec
- SS Locks: Lock Wait Time (ms)
- SS Locks: Lock Waits/sec
- SS Memory Manager: Memory grants pending
- SS SQL Statistics: Auto-Params attempts/sec
- SS SQL Statistics: Batch requests/sec
- SS SQL Statistics: Safe Auto-Params/sec
- SS SQL Statistics: SQL Compilations/sec
- SS SQL Statistics: SQL Re-Compilations/sec
- System: Processor Queue Length
- Memory: Page faults/sec
- Memory: pages/sec
- Physical Disk: Avg. Disk Queue Length
- Physical Disk: Avg. Disk sec/Transfer
- Physical Disk: Avg. Disk sec/Read
- Physical Disk: Avg. Disk sec/Write
- Physical Disk: Current Disk Queue Length
- Processor: %Processor Time
- SS Access Methods: Forwarded Records/sec
- SS Access Methods: Full Scans/sec
- SS Access Methods: Index Searches/sec
- SS Access Methods: Page Splits/sec
- SS Access Methods: Range Scans/sec
- SS Access Methods: Table Lock escalations/sec
- SS Buffer Manager: Checkpoint pages/sec
- SS Buffer Manager: Lazy writes/sec
- SS Buffer Manager: Page Life expectancy
- SS Buffer Node:Foreign Pages
- SS Buffer Node:Page Life expectancy
- SS Buffer Node:Stolen Pages
|
|
48
|
- Overview
- What are the characteristics of OLTP?
- What are the goals of OLTP?
- Design, Techniques and Best practices
- Transactions
- Concurrency
- Database design
- Normalization, Denormalization,
- Index maintenance issues
- Identifying Performance issues
- Resource utilization
- Optimization
- Estimation and query plan selection
- Plan re-use & Recompilation
- Useful counters
- OLTP Performance Blueprint
- Summary
|
|
49
|
|
|
50
|
|
|
51
|
|
|
52
|
|
|
53
|
|
|
54
|
- Overview performance objectives of OLTP
- Design, Techniques and Best practices
- Transactions
- Concurrency
- Database design
- Normalization, Denormalization,
- Index maintenance issues
- High End implementation findings
- Identifying Performance issues
- Resource utilization
- Optimization
- Estimation and query plan selection
- Plan re-use & Recompilation
- Useful counters
- OLTP Performance Blueprint
- Summary
|
|
55
|
- Challenge: Scheduling a mix workload evenly across Schedulers
- Database Log to handle 60,000+ database tx/sec
- Real time reporting and loading data
- Indexes are both good and bad
- OLTP general goal: limit recompiles
- Multiple database logs for scalability
- Read-only queries: consider another database via replication, log
shipping or Shared Scalable Database
|
|
56
|
- Database design driven by workload requirements
- Indexes
- Denormalization decisions
- Transactions
- Maximizing resources
- Plan re-use – normally desirable for OLTP
- Recompilation – generally try to avoid with OLTP
- Set based operations more efficient than cursors
- Reduce parallel queries to improve concurrency
- Sp_configure “max degree of parallelism”,1 -- turns off
- Check for good query plans – set statistics profile on
- Good data access – see Benchmark lessons
|
|
57
|
- OLTP applications require appropriate
- database design
- Transaction usage
- High concurrency - must minimize blocking
- Application design
- Use code coding techniques for plan re-use, minimize recompiles
- API
- Maximize performance with most efficient calls
- Access methods
- Efficient query plans for OLTP
|
|
58
|
- SQL Server 2005 Batch Compilation, Recompilation, and Plan Caching
Issues
- http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
- SQL Customer Advisory Team internal site
- http://sqlserver/sites/sqlcat
- SQL Customer Advisory Team blog
- http://blogs.msdn.com/sqlcat
- SQL Server Webcasts
- http://www.microsoft.com/technet/prodtechnol/sql/webcasts/default.mspx
|
|
59
|
|