2.2 Using SQL Pattern Matching

«« Previous
Next »»

1. Overview


Purpose

This tutorial covers SQL for Pattern Matching. Row pattern matching in native SQL improves application and development productivity and query efficiency for row-sequence analysis.

Time to Complete

Approximately one to two hours

Introduction

Recognizing patterns in a sequence of rows has been a capability that was widely desired, but not possible with SQL until now. There were many workarounds, but these were difficult to write, hard to understand, and inefficient to execute. With Oracle Database 12c Release 1 (12.1), you can use the MATCH_RECOGNIZE clause to perform pattern matching in SQL to do the following:
  1. Logically partition and order the data that is used in the MATCH_RECOGNIZE clause with its PARTITION BY and ORDER BY clauses.
  2. Define patterns of rows to seek using the PATTERN clause of the MATCH_RECOGNIZE clause. These patterns use regular expressions syntax, a powerful and expressive feature, applied to the pattern variables you define.
  3. Specify the logical conditions required to map a row to a row pattern variable in the DEFINE clause.
  4. Define measures, which are expressions usable in the MEASURES clause of the SQL query.
Scenario (Optional)

The syntax for pattern expressions incorporates regular expressions and full conditional logic, enabling precise and flexible pattern definition. Whatever the domain is (for example, financial market prices, internet clicks, or security sensor output), applications analyzing row sequences can benefit from MATCH_RECOGNIZE.

Hardware and Software Requirements (Optional)

The following is a list of hardware and software requirements:
  • Oracle Database 12c Release 1 (12.1) Enterprise Edition for linux
  • Oracle SQL Developer version 3.0 or higher (optional)
  • Mozilla Firefox Web browser
  • A linux text editor such as gedit
Prerequisites

Before starting this tutorial, you should:
  • Install Oracle Database 12c Release 1 (12.1) Enterprise Edition for linux
  • Have access to or have installed the sample schemas
  • Install Oracle SQL Developer
  • Download the files.zip file which contains the script files used in this tutorial to your local drive. In this tutorial, we copied the downloaded required files.zip file to the /home/oracle/files folder on a linux machine
  • The code examples in this tutorial use the OE schema included in the Oracle Database 12c Release 1 (12.1) . The Oracle Database Sample Schemas 12c Release 1 (12.1) is part of the Oracle Database 12c Release 1 (12.1) documentation set

2. Using Pattern Matching 


In this section, you learn the benefits of using pattern matching and some of the tasks and keywords you can use.
  1. Why Use Pattern Matching?
The ability to recognize patterns found across multiple rows is important for many types of applications. Examples include all kinds of business processes driven by sequences of events. Examples are:
  • Financial applications seeking patterns of pricing, trading volume, and other behavior
  • Security applications where unusual behavior must be detected
  • Fraud detection applications
  • Sensor data analysis. One term describing this general area is complex event processing, and pattern matching is a powerful aid to this activity.
Tasks and Keywords in Pattern Matching

Let us go over some of the tasks and keywords used in pattern matching.

Task/Keyword
Description
PARTITION BYLogically divide the rows into groups at a high level
ORDER BYLogically order the rows in partition
ONE ROW PER MATCHReturns one summary row of output for each match
ALL ROWS PER MATCHReturns one detail row of output for each row of each match
MEASURESDefines row pattern measure columns
PATTERNDefines which pattern variables must be matched, the sequence in which they must be matched, and the quantity of rows which must be matched
DEFINESpecifies the conditions that defines a pattern variable
AFTER MATCHRestart the matching process after a match is found
MATCH_NUMBERFinds which pattern variable applies to which rows
CLASSIFIERIdentifies which component of a pattern applies to a specific row

PARTITION BY: Typically, you want to divide your input data into logical groups for analysis. In the stock examples, you want to divide the pattern matching so that it applies to just one stock at a time. You do that with PARTITION BY which specifies that the rows of the input table are to be partitioned by one or more columns. Matches are found within partitions and do not cross partition boundaries. If there is no PARTITION BY, then all rows of the row pattern input table constitute a single row pattern partition.

ORDER BY: Used to specify the order of rows within a row pattern partition. If the order of two rows in a row pattern partition is not determined by the ORDER BY, then the result of MATCH_RECOGNIZE is non-deterministic: it may not give consistent results each time the query is run.

[ONE ROW | ALL ROWS] PER MATCH]: Sometimes you may need summary data about the matches while other times, you may need details. You can do that as follows: ONE ROW PER MATCH gives you one row of output for each match. ALL ROWS PER MATCH gives you one row of output for each row of each match. This is the default.

MEASURES: The MEASURES clause defines a list of columns for the pattern output table. Each pattern measure column is defined with a column name whose value is specified by a corresponding pattern measure expression.

PATTERN: Specifies the pattern to be recognized in the ordered sequence of rows in a partition. Each variable name in a pattern corresponds to a Boolean condition, which is specified later using the DEFINE component of the syntax. The PATTERN clause is used to specify a regular expression. The regular expression is enclosed in parentheses.

DEFINE: Because the PATTERN clause depends on pattern variables, you must have a clause to define these variables. They are specified in the DEFINE clause (required), and are used to specify the conditions that defines a pattern variable.

AFTER MATCH: Once the query finds a match, it is vital that it begins looking for the next match at exactly the right point. Do you want to find matches where the end of the earlier match overlaps the start of the next match? Or do you want some other variation? Pattern matching provides great flexibility in specifying the restart point.

MATCH_NUMBER: You might have a large number of matches for your pattern inside a given row partition. How do you tell all these matches apart? This is done with the MATCH_NUMBER function. Matches within a row pattern partition are numbered sequentially starting with 1 in the order they are found. Note that match numbering starts over again at 1 in each row pattern partition, because there is no inherent ordering between row pattern partitions.

CLASSIFIER: Along with knowing which MATCH_NUMBER you are seeing, you may want to know which component of a pattern applies to a specific row. This is done using the CLASSIFIER function. The classifier of a row is the pattern variable that the row is mapped to by a row pattern match. The classifier of a row that is not mapped by a row pattern match is null. The CLASSIFIER function returns a character string whose value is the classifier of a row.

Pattern Match Example: Stock Chart

Pattern matching enables you to identify price patterns, such as V-shapes and W-shapes illustrated in the following chart along with performing many types of calculations. For example, your calculations might include the count of observations or the average value on a downward or upward slope. The following chart uses the stock price which you can load into your database with the CREATE and INSERT statements. You can then issue a query to find all cases where stock prices dipped to a bottom price and then rose. This is generally called a V-shape.

The following steps use the stock price graph. You can load the stock chart example into your database using the SQL CREATE and INSERT statements that follow. The scripts are also found in the files.zip file.

Using SQL Pattern Matching - Series

1. Create the ticker table as follows:

CREATE TABLE ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);

2. Next, populate the newly created ticker table with some data using the SQL INSERT statements as follows:

INSERT INTO Ticker VALUES('ACME', '01-Apr-11', 12);
INSERT INTO Ticker VALUES('ACME', '02-Apr-11', 17);
INSERT INTO Ticker VALUES('ACME', '03-Apr-11', 19); 
INSERT INTO Ticker VALUES('ACME', '04-Apr-11', 21);
INSERT INTO Ticker VALUES('ACME', '05-Apr-11', 25); 
INSERT INTO Ticker VALUES('ACME', '06-Apr-11', 12); 
INSERT INTO Ticker VALUES('ACME', '07-Apr-11', 15); 
INSERT INTO Ticker VALUES('ACME', '08-Apr-11', 20); 
INSERT INTO Ticker VALUES('ACME', '09-Apr-11', 24); 
INSERT INTO Ticker VALUES('ACME', '10-Apr-11', 25); 
INSERT INTO Ticker VALUES('ACME', '11-Apr-11', 19); 
INSERT INTO Ticker VALUES('ACME', '12-Apr-11', 15); 
INSERT INTO Ticker VALUES('ACME', '13-Apr-11', 25); 
INSERT INTO Ticker VALUES('ACME', '14-Apr-11', 25); 
INSERT INTO Ticker VALUES('ACME', '15-Apr-11', 14); 
INSERT INTO Ticker VALUES('ACME', '16-Apr-11', 12); 
INSERT INTO Ticker VALUES('ACME', '17-Apr-11', 14); 
INSERT INTO Ticker VALUES('ACME', '18-Apr-11', 24); 
INSERT INTO Ticker VALUES('ACME', '19-Apr-11', 23); 
INSERT INTO Ticker VALUES('ACME', '20-Apr-11', 22);

3. Enter the following query which demonstrates Simple V Shape With 1 Row Output per Match:

SELECT * 
FROM Ticker MATCH_RECOGNIZE ( 
PARTITION BY symbol 
ORDER BY tstamp 
MEASURES STRT.tstamp AS start_tstamp, 
LAST(DOWN.tstamp) AS bottom_tstamp, 
LAST(UP.tstamp) AS end_tstamp 
ONE ROW PER MATCH 
AFTER MATCH SKIP TO LAST UP 
PATTERN (STRT DOWN+ UP+) 
DEFINE 
DOWN AS DOWN.price < PREV(DOWN.price), 
UP AS UP.price > PREV(UP.price) ) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;

4. The output of the preceding code is displayed:

Using SQL Pattern Matching - Series

Before examining the query, let us look at the output.

There are only three rows because we chose to report just One Row Per Match, and three matches were found. MATCH_RECOGNIZE lets you choose between showing one row per match or all rows per match.In this example, we use the shorter output of one row per match.

What does this query do? Let us explain each line in the MATCH_RECOGNIZE clause:

PARTITION BY divides the data from the ticker table into logical groups where each group contains one stock symbol.

ORDER BY orders the data within each logical group by tstamp.

MEASURES defines three measures:
  • The timestamp at the beginning of a V-shape (start_tstamp)
  • The timestamp at the bottom of a V-shape (bottom_tstamp)
  • The timestamp at the end of the a V-shape (end_tstamp)

The bottom_tstamp and end_tstamp measures use the LAST() function to ensure that the values retrieved are the final values of the timestamp within each pattern match.
ONE ROW PER MATCH means that for every pattern match found, there is one row of output.

AFTER MATCH SKIP TO LAST UP means that whenever a match is found, the search is restarted at the row which is the last row of the UP pattern variable. A pattern variable is a variable used in a MATCH_RECOGNIZE statement, and is defined in the DEFINE clause described below.

PATTERN (STRT DOWN+ UP+) indicates that the search pattern has three pattern variables: STRT, DOWN, and UP. The "+" after DOWN and UP means that at least one row must be mapped to each of them. The pattern defines a regular expression, which is a highly expressive way to search for patterns.

DEFINE specifies the conditions that must be met for a row to map to our row pattern variables STRT, DOWN, and UP. Because there is no condition for STRT, any row can be mapped to STRT. Why bother having a pattern variable with no condition? We use it as a starting point for testing for matches. Both DOWN and UP take advantage of the PREV() function, which lets them compare the price in the current row to the price in the prior row. DOWN is matched when a row has a lower price than the row that preceded it, so it defines the downward (left) leg of our V-shape. A row can be mapped to UP if the row has a higher price than the row that preceded it.

5. Let us look at two charts to better understand the results returned the previous query. The following chart helps in identifying which dates are mapped to which pattern variables.

Using SQL Pattern Matching - Series

The chart shows the dates mapped to specific pattern variables, as specified in the PATTERN clause. Once the mappings of pattern variables to dates are available, that information is used by the MEASURES clause to calculate the measure values. The chart labels every date mapped to a pattern variable. The mapping is based on the pattern specified in the PATTERN clause and the logical conditions specified in then DEFINE clause. The thin vertical lines show the borders of the three matches that were found for the pattern. In each match, the first date has the STRT pattern variable mapped to it (labeled as "Start"), followed by one or more dates mapped to the DOWN pattern variable, and finally, one or more dates mapped to the UP pattern variable. Since the AFTER MATCH SKIP TO LAST UP in was specified in the query, two adjacent matches can share a row. That means a single date can have two variables mapped to it. For example, 10-April has both the pattern variables UP and STRT mapped to it: April 10 is the end of Match 1 and the start of Match 2.

6. Let us look at the second chart which helps identifying to which dates the measures correspond:

Using SQL Pattern Matching - Series

In the chart, the measures defined in the MEASURES clause of the query are labeled solely: START (start_tstamp in the query), BOTTOM (bottom_tstamp in the query), and END (end_tstamp in the query). As in the previous chart, the thin vertical lines show the borders of the three matches found for the pattern. Every match has a Start date, a Bottom date, and an End date. As with the previous chart, the date 10-April is found in two matches:

It is the END measure for Match 1 and the START measure for Match 2. The labeled dates in the chart show which dates correspond to the measure definitions, which are in turn based on the pattern variable mappings shown in the first chart.

Note that the dates labeled in the chart correspond exactly to the nine dates shown earlier in the output of the example. The first row of the output has the dates shown in Match 1, the second row of the output has the dates shown in Match 2, and the third row of the output has the dates shown in Match 3.


2. How is Data Processed in Pattern Matching?


Let us identify how data is processed in pattern matching using the MATCH_RECOGNIZE clause.

  1. The row pattern input table is partitioned according to the PARTITION BY clause. Each row pattern partition consists of the set of rows of the row pattern input table that are equal on the row pattern partitioning columns.
  2. Each row pattern partition is ordered according to the ORDER BY clause.
  3. Each ordered row pattern partition is searched for matches to the PATTERN.
  4. Pattern matching operates by seeking the match at the earliest row, considering the rows in a row pattern partition in the order specified by the ORDER BY. Pattern matching in a sequence of rows is an incremental process, with one row after another examined to see if it fits the pattern. With this incremental processing model, at any step until the complete pattern has been recognized, we only have a partial match and we do not know what rows might be added in the future, nor what variables those future rows might be mapped to. If no match is found at the earliest row, the search moves to the next row in the partition, checking to see if a match can be found starting with that row.
  5. After a match is found, row pattern matching calculates the row pattern measure columns, which are expressions defined by the MEASURES clause.
  6. Using ONE ROW PER MATCH, as shown in the first example, pattern matching generates one row for each match that is found. If you use ALL ROWS PER MATCH, every row which is matched is included in the pattern match output.
  7. The AFTER MATCH SKIP clause determines where row pattern matching resumes within a row pattern partition after a non-empty match has been found. In the example above, row pattern matching resumes at the next row after the rows mapped by a match (AFTER MATCH SKIP TO LAST UP).

3. Additional Pattern Matching Examples and Features



In this section, we will look at some additional pattern matching examples and some (not all) additional features of pattern matching.

  • Pattern Match for a Simple V-Shape With All Rows Output Per Match
1. Enter the following query which demonstrates pattern match for a Simple V-Shape With All Rows Output Per Match.


-- The first line in the example is used to improve formatting if you are using SQL*Plus.
COLUMN var_match FORMAT A4 
SELECT * FROM Ticker MATCH_RECOGNIZE ( 
PARTITION BY symbol 
ORDER BY tstamp 
MEASURES STRT.tstamp AS start_tstamp, 
FINAL LAST(DOWN.tstamp) AS bottom_tstamp, 
FINAL LAST(UP.tstamp) AS end_tstamp, 
MATCH_NUMBER() AS match_num, 
CLASSIFIER() AS var_match 
ALL ROWS PER MATCH 
AFTER MATCH SKIP TO LAST UP 
PATTERN (STRT DOWN+ UP+)
DEFINE
DOWN AS DOWN.price < PREV(DOWN.price), 
UP AS UP.price > PREV(UP.price) ) MR 
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.match_num, MR.tstamp;

What does this query do? It's very similar to the earlier query except for items in the MEASURES clause, the change to , and a change to the ORDER BY at the end of the query. In the MEASURES clause, there are the following additions:

  • MATCH_NUMBER() AS match_num - Because this example returns multiple rows per match, we want to know which rows are members of which match. MATCH_NUMBER assigns the same number to each row of a specific match. For instance, all the rows in the first match found in a row pattern partition are assigned the MATCH_NUMBER value of 1. Note that match numbering starts over again at 1 in each row pattern partition.
  • CLASSIFIER() AS var_match - We would like to know which rows map to which variable, and the CLASSIFIER function lets us do this. In this example, some will map to the STRT variable, some rows the DOWN variable, and others, the UP variable.
  • FINAL LAST() syntax was added to the bottom_tstamp and end_tstamp measures. By specifying FINAL and using the LAST() function for bottom_tstamp, every row inside each match shows the same date for the bottom of its V-shape. Similarly, applying FINAL LAST() to the end_tstamp measure makes every row in each match show the same date for the end of its V-shape. Without this syntax, the dates shown would be the running value for each row.

Changes were made in two other lines:
  • ALL ROWS PER MATCH - While the earlier example gave us a summary with just 1 row about each match using the line ONE ROW PER MATCH, this example asks to show every row of each match.
  • ORDER BY on the last line - This was changed to take advantage of the MATCH_NUM. We want to all rows in the same match together and in chronological order.
2. Let us look at the output of the query:

Using SQL Pattern Matching - Series

Note that the row for April 10 appears twice because it is in two pattern matches:

  1. It is the last day of the first match, and
  2. the first day of the second match

  • Pattern Match for a W-Shape
1. Enter the following query which demonstrates pattern match for a W -Shape:

SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES STRT.tstamp AS start_tstamp,
UP.tstamp AS end_tstamp
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST UP
PATTERN (STRT DOWN+ UP+ DOWN+ UP+)
DEFINE
DOWN AS DOWN.price < PREV(DOWN.price),
UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;

2. The output of the query is as follows:

Using SQL Pattern Matching - Series

This query is identical to the query in first query except for the line defining the PATTERN. The regular expression has been modified to seek the pattern DOWN followed by UP two consecutive times: PATTERN (STRT DOWN+ UP+ DOWN+ UP+).

This pattern specification means it can only match a W-shape where the two V-shapes have no separation between them. For instance, if there is a flat interval with the price not changing, and that interval occurs between two V-shapes, the pattern will not match that data.


4. Advanced Topics 



Pattern matching is an extremely rich feature that offers many capabilities beyond the scope of this introductory tutorial. In this section, we note four pattern matching features. For details on the full capabilities in pattern matching, see the Pattern Matching chapter in the Oracle Database 12c Release 1 (12.1) Data Warehousing Guide reference guide.
1. Nesting FIRST and LAST Within PREV and NEXT
FIRST and LAST provide navigation within the set of rows already mapped to a particular pattern variable. PREV and NEXT provide navigation using a physical offset from a particular row. You can combine navigation by nesting FIRST or LAST within PREV or NEXT: PREV (LAST (A.Price + A.Tax, 1), 3)
2. Handling Empty Matches or Unmatched Rows
ALL ROWS PER MATCH has three sub options:
  • ALL ROWS PER MATCH SHOW EMPTY MATCHES: With this option, an empty match generates a single row in the row pattern output table.
  • ALL ROWS PER MATCH OMIT EMPTY MATCHES: With this option, an empty match is omitted from the row pattern output table.
  • ALL ROWS PER MATCH WITH UNMATCHED ROWS: This option shows both empty matches and unmatched rows.
3. Excluding Portions of the Pattern from the Output
When using ALL ROWS PER MATCH with either the OMIT EMPTY MATCHES or SHOW EMPTY MATCHES options, rows matching a portion of the PATTERN may be excluded from the row pattern output table. The excluded portion is bracketed between {- and -} in the PATTERN clause.

4. Handling Patterns Where Multiple Orderings are Satisfactory Matches
You can handle patterns where multiple orderings are satisfactory matches using the PERMUTE syntax. For example PATTERN (PERMUTE (A, B, C)) is equivalent to an alternation of all permutations of three pattern variables A, B, and C, as follows:

PATTERN (A B C | A C B | B A C | B C A | C A B | C B A)

5. Rules and Restrictions in Pattern Matching
  • Input Table Requirements: The row pattern input table is the input argument to MATCH_RECOGNIZE. You can use a table or view, or a named query (defined using the WITH clause). The row pattern input table can also be a derived table (also known as in-line view).
  • Prohibited Nesting in the MATCH_RECOGNIZE Clause: The following kinds of nesting are prohibited in pattern matching:
    • Nesting one pattern matching clause within another.
    • Outer references in MEASURES or DEFINE. This means that a pattern matching may not reference any table in an outer query block except the row pattern input table.
    • Correlated subqueries may not be used in MEASURES or DEFINE. In addition, subqueries in MEASURES or DEFINE cannot reference pattern variables.
    • Pattern matching may not be used in recursive queries.
    • SELECT FOR UPDATE cannot use MATCH_RECOGNIZE.
  • Concatenated Pattern Matching: It is not prohibited to feed the output of one pattern matching into the input of another.
  • Aggregate Restrictions: Only the following aggregate functions can be used in the MEASURES and DEFINE clauses: COUNT, SUM, AVG, MAX, and MIN.

«« Previous
Next »»