ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Explain Query Plan and Query Performance with Postgres
    DB/RDB 2019. 8. 23. 09:34

    1. Overview

    Let's analyze the query plan with Postgres to enhance performance using explain and analysis.

    2. Description

    2.1 Optimizer

    • The brain of the database, which interprets queries and determines the fastest method of execution. A single query optimization technique can increase database performance drastically.

    2.2 Explain & Analyze

    2.2.1 Explain

    • Displaying the execution plan that the PostgreSQL planner generates for the supplied statement
    • Gives an exact breakdown of a query.
    • Based on the statistics about the table
    • Identify the most efficient path to the data
    • Takes different database paradigms into consideration such as indexes
    • Only guessing a plan that it thinks it will execute

    2.2.2 Analyze

    • Basically runs a query to find the processing time to execute

    3. Down to the nuts and bolts

    Most commonly occurring matches are scanned using the seq scan and the least common matches are scanned using an index scan, anything in between is scanned using a bitmap heap scan followed by an index scan. One of the reasons for this is that random I/O is very slow as compared to sequential I/O.

    3.1 Bitmap scan (Sequential Scan)

    • A sequential scan
    • Sequentially open a shortlist of disk pages and grab every applicable row in each one.

    3.2 Index scan

    • The index is visited row by row in order
    • Results in disk page being visited multiple times

    3.3 Comparison between Sequential scan and index scan

    Cases where a sequential scan is faster than an index scan

    • Usually faster than reading in random order because an index scan requires several I/O for each row which includes looking up a row in the index looking up and retrieving the row from memory
    • The sequential scan requires a single I/O operation to retrieve more than one block containing multiple rows.

    3.4 Query plan on JOINS

    3.4.1 Nested Loop

    • Can use either a sequential scan or index scan
    • sequential scan used when the second table is small
    • The basic logic of choosing between a sequential scan and index scan applied here too

    3.4.2 Hash Join

    • Creating a hash table of the smaller table on the join key
    • The larger table is scanned, searching the hash table for the rows which meet the join condition
    • Require a lot of memory to store the hash table in the first place

    3.4.3 Merge Join

    • Similar to merge sort algorithm
    • Planner sorts both tables to be joined on the join attribute
    • Scanned in parallel to fine the matching values

    4. Index

    4.1 Multiple column indexes

    • A multi-column B-Tree index
    • Most efficient when there are constraints on the leading columns(leftmost).

    4.2 Cover-index

    When an index includes the actual values of all the fields specified in the query, the index covers the query and does not require an additional step to fetch the actual values from the data service. An index, in this case, is called a covering index and the query is called a covered query. As a result, covered queries are faster and deliver better performance.

    4.2.1 Without Covering Indexes

    4.2.2 With Covering Indexes

    4.3 Unique index

    • Enforce uniqueness of a column's value or the uniqueness of a combined value of more than one column.
    • Postgres creates an index for the defined primary key itself.
    • Creating unique constraint rather than unique index by the Postgres product team
    • When we create a unique constraint, Postgres automatically creates an index behind the scene.

    4.4 Examples

    4.4.1 index is not used

    • LIKE '%scheme', but LIKE 'sheme%' can use the index
    • The upper/lower case function used in where clause
      • or Creating lower case index like below
      • CREATE INDEX idx_scheme_name ON schemes (lower(scheme_name))

    4.5 Partial Index

    • A subset of the rows of a table
    • CREATE INDEX idx_scheme_name ON schemes start_time WHERE start_time < '11:00:00'
    • Have not indexed all the schemes where start_time is after 11:00 AM

    5. Downsides fo indexes in production environments

    5.1 Finding unused indexes

    • Indexed eat memory

    5.2 CREATE/DROP index and CREATE/DROP index concurrently

    • Creating and dropping an index in a large database can take hours or even days and the CREATE INDEX command blocks all the writers on a table
    • CREATE INDEX CONCURRENT will not acquire locks against writes
      • Firstly, scans the table to build indexes 
      • Run the index once again for the things to be added since the first pass
    • If something goes wrong during creating an index concurrently, it does not roll back and leaves an invalid index behind can find these like below
      • SELECT * FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;

    5.3 Rebuilding indexes

    • REINDEX rebuilds an index using the data stored in the index table replacing the old copy of the index
      • REINDEX INDEX or REINDEX TABLE
    • REINDEX locks out writes but not reads of the index's parent table.
    • Also takes an exclusive lock on the specific index being processed, which will block reads that attempt to use that index.

    6. References

    http://dwgeek.com/improve-performance-netezza-query-plan.html/

    https://www.vertica.com/kb/Reading-Query-Plans/Content/BestPractices/Reading-Query-Plans.htm

    https://docs.oracle.com/cd/E17952_01/workbench-en/wb-performance-explain.html

    https://www.hwaci.com/sw/sqlite/eqp.html

    https://blog.gojekengineering.com/the-postgres-performance-tuning-manual-query-plans-52a023c2342d

    https://www.postgresql.org/docs/11/using-explain.html#USING-EXPLAIN-ANALYZE

    https://blog.gojekengineering.com/the-postgres-performance-tuning-manual-indexes-19a1bdfe28ba

    https://wiki.postgresql.org/wiki/Index_Maintenance

    https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/covering-indexes.html

    'DB > RDB' 카테고리의 다른 글

    Dirty Read  (0) 2019.09.30
    Phantom Read  (0) 2019.09.30
    ACID  (0) 2019.09.30
    Transactional locking mechanism and Multi-version concurrency control(MVCC)  (0) 2019.09.30
    OLTP, OLAP, and ETL  (0) 2019.09.11

    댓글

Designed by Tistory.