vis

a vi-like editor based on Plan 9's structural regular expressions
lexer.lua

(91147B)
      1 -- Copyright 2006-2025 Mitchell. See LICENSE.
      2 
      3 --- Lexes Scintilla documents and source code with Lua and LPeg.
      4 --
      5 -- ### Contents
      6 --
      7 -- 1. [Writing Lua Lexers](#writing-lua-lexers)
      8 -- 2. [Lexer Basics](#lexer-basics)
      9 --   - [New Lexer Template](#new-lexer-template)
     10 --   - [Tags](#tags)
     11 --   - [Rules](#rules)
     12 --   - [Summary](#summary)
     13 -- 3. [Advanced Techniques](#advanced-techniques)
     14 --   - [Line Lexers](#line-lexers)
     15 --   - [Embedded Lexers](#embedded-lexers)
     16 --   - [Lexers with Complex State](#lexers-with-complex-state)
     17 -- 4. [Code Folding](#code-folding)
     18 -- 5. [Using Lexers](#using-lexers)
     19 -- 6. [Migrating Legacy Lexers](#migrating-legacy-lexers)
     20 -- 7. [Considerations](#considerations)
     21 -- 8. [API Documentation](#lexer.add_fold_point)
     22 --
     23 -- ### Writing Lua Lexers
     24 --
     25 -- Lexers recognize and tag elements of source code for syntax highlighting. Scintilla (the
     26 -- editing component behind [Textadept][] and [SciTE][]) traditionally uses static, compiled C++
     27 -- lexers which are difficult to create and/or extend. On the other hand, Lua makes it easy to
     28 -- to rapidly create new lexers, extend existing ones, and embed lexers within one another. Lua
     29 -- lexers tend to be more readable than C++ lexers too.
     30 --
     31 -- While lexers can be written in plain Lua, Scintillua prefers using Parsing Expression
     32 -- Grammars, or PEGs, composed with the Lua [LPeg library][]. As a result, this document is
     33 -- devoted to writing LPeg lexers. The following table comes from the LPeg documentation and
     34 -- summarizes all you need to know about constructing basic LPeg patterns. This module provides
     35 -- convenience functions for creating and working with other more advanced patterns and concepts.
     36 --
     37 -- Operator | Description
     38 -- -|-
     39 -- `lpeg.P`(*string*) | Matches string *string* literally.
     40 -- `lpeg.P`(*n*) | Matches exactly *n* number of characters.
     41 -- `lpeg.S`(*string*) | Matches any character in string set *string*.
     42 -- `lpeg.R`("*xy*") | Matches any character between range *x* and *y*.
     43 -- *patt*`^`*n* | Matches at least *n* repetitions of *patt*.
     44 -- *patt*`^`-*n* | Matches at most *n* repetitions of *patt*.
     45 -- *patt1* `*` *patt2* | Matches *patt1* followed by *patt2*.
     46 -- *patt1* `+` *patt2* | Matches *patt1* or *patt2* (ordered choice).
     47 -- *patt1* `-` *patt2* | Matches *patt1* if *patt2* does not also match.
     48 -- `-`*patt* | Matches if *patt* does not match, consuming no input.
     49 -- `#`*patt* | Matches *patt* but consumes no input.
     50 --
     51 -- The first part of this document deals with rapidly constructing a simple lexer. The next part
     52 -- deals with more advanced techniques, such as embedding lexers within one another. Following
     53 -- that is a discussion about code folding, or being able to tell Scintilla which code blocks
     54 -- are "foldable" (temporarily hideable from view). After that are instructions on how to use
     55 -- Lua lexers with the aforementioned Textadept and SciTE editors. Finally there are comments
     56 -- on lexer performance and limitations.
     57 --
     58 -- [LPeg library]: http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html
     59 -- [Textadept]: https://orbitalquark.github.io/textadept
     60 -- [SciTE]: https://scintilla.org/SciTE.html
     61 --
     62 -- ### Lexer Basics
     63 --
     64 -- The *lexers/* directory contains all of Scintillua's Lua lexers, including any new ones you
     65 -- write. Before attempting to write one from scratch though, first determine if your programming
     66 -- language is similar to any of the 100+ languages supported. If so, you may be able to copy
     67 -- and modify, or inherit from that lexer, saving some time and effort. The filename of your
     68 -- lexer should be the name of your programming language in lower case followed by a *.lua*
     69 -- extension. For example, a new Lua lexer has the name *lua.lua*.
     70 --
     71 -- #### New Lexer Template
     72 --
     73 -- There is a *lexers/template.txt* file that contains a simple template for a new lexer. Feel
     74 -- free to use it, replacing the '?' with the name of your lexer. Consider this snippet from
     75 -- the template:
     76 --
     77 -- ```lua
     78 -- -- ? LPeg lexer.
     79 --
     80 -- local lexer = lexer
     81 -- local P, S = lpeg.P, lpeg.S
     82 --
     83 -- local lex = lexer.new(...)
     84 --
     85 -- --[[... lexer rules ...]]
     86 --
     87 -- -- Identifier.
     88 -- local identifier = lex:tag(lexer.IDENTIFIER, lexer.word)
     89 -- lex:add_rule('identifier', identifier)
     90 --
     91 -- --[[... more lexer rules ...]]
     92 --
     93 -- return lex
     94 -- ```
     95 --
     96 -- The first line of code is a Lua convention to store a global variable into a local variable
     97 -- for quick access. The second line simply defines often used convenience variables. The third
     98 -- and last lines [define](#lexer.new) and return the lexer object Scintillua uses; they are
     99 -- very important and must be part of every lexer. Note the `...` passed to `lexer.new()` is
    100 -- literal: the lexer will assume the name of its filename or an alternative name specified
    101 -- by `lexer.load()` in embedded lexer applications. The fourth line uses something called a
    102 -- "tag", an essential component of lexers. You will learn about tags shortly. The fifth line
    103 -- defines a lexer grammar rule, which you will learn about later. (Be aware that it is common
    104 -- practice to combine these two lines for short rules.)  Note, however, the `local` prefix in
    105 -- front of variables, which is needed so-as not to affect Lua's global environment. All in all,
    106 -- this is a minimal, working lexer that you can build on.
    107 --
    108 -- #### Tags
    109 --
    110 -- Take a moment to think about your programming language's structure. What kind of key elements
    111 -- does it have? Most languages have elements like keywords, strings, and comments. The
    112 -- lexer's job is to break down source code into these elements and "tag" them for syntax
    113 -- highlighting. Therefore, tags are an essential component of lexers. It is up to you how
    114 -- specific your lexer is when it comes to tagging elements. Perhaps only distinguishing between
    115 -- keywords and identifiers is necessary, or maybe recognizing constants and built-in functions,
    116 -- methods, or libraries is desirable. The Lua lexer, for example, tags the following elements:
    117 -- keywords, functions, constants, identifiers, strings, comments, numbers, labels, attributes,
    118 -- and operators. Even though functions and constants are subsets of identifiers, Lua programmers
    119 -- find it helpful for the lexer to distinguish between them all. It is perfectly acceptable
    120 -- to just recognize keywords and identifiers.
    121 --
    122 -- In a lexer, LPeg patterns that match particular sequences of characters are tagged with a
    123 -- tag name using the the `lexer.tag()` function. Let us examine the "identifier" tag used in
    124 -- the template shown earlier:
    125 --
    126 -- ```lua
    127 -- local identifier = lex:tag(lexer.IDENTIFIER, lexer.word)
    128 -- ```
    129 --
    130 -- At first glance, the first argument does not appear to be a string name and the second
    131 -- argument does not appear to be an LPeg pattern. Perhaps you expected something like:
    132 --
    133 -- ```lua
    134 -- lex:tag('identifier', (lpeg.R('AZ', 'az')  + '_') * (lpeg.R('AZ', 'az', '09') + '_')^0)
    135 -- ```
    136 --
    137 -- The `lexer` module actually provides a convenient list of common tag names and common LPeg
    138 -- patterns for you to use. Tag names for programming languages include (but are not limited
    139 -- to) `lexer.DEFAULT`, `lexer.COMMENT`, `lexer.STRING`, `lexer.NUMBER`, `lexer.KEYWORD`,
    140 -- `lexer.IDENTIFIER`, `lexer.OPERATOR`, `lexer.ERROR`, `lexer.PREPROCESSOR`, `lexer.CONSTANT`,
    141 -- `lexer.CONSTANT_BUILTIN`, `lexer.VARIABLE`, `lexer.VARIABLE_BUILTIN`, `lexer.FUNCTION`,
    142 -- `lexer.FUNCTION_BUILTIN`, `lexer.FUNCTION_METHOD`, `lexer.CLASS`, `lexer.TYPE`, `lexer.LABEL`,
    143 -- `lexer.REGEX`, `lexer.EMBEDDED`, and `lexer.ANNOTATION`. Tag names for markup languages include
    144 -- (but are not limited to) `lexer.TAG`, `lexer.ATTRIBUTE`, `lexer.HEADING`, `lexer.BOLD`,
    145 -- `lexer.ITALIC`, `lexer.UNDERLINE`, `lexer.CODE`, `lexer.LINK`, `lexer.REFERENCE`, and
    146 -- `lexer.LIST`. Patterns include `lexer.any`, `lexer.alpha`, `lexer.digit`, `lexer.alnum`,
    147 -- `lexer.lower`, `lexer.upper`, `lexer.xdigit`, `lexer.graph`, `lexer.punct`, `lexer.space`,
    148 -- `lexer.newline`, `lexer.nonnewline`, `lexer.dec_num`, `lexer.hex_num`, `lexer.oct_num`,
    149 -- `lexer.bin_num`, `lexer.integer`, `lexer.float`, `lexer.number`, and `lexer.word`. You may use
    150 -- your own tag names if none of the above fit your language, but an advantage to using predefined
    151 -- tag names is that the language elements your lexer recognizes will inherit any universal syntax
    152 -- highlighting color theme that your editor uses. You can also "subclass" existing tag names by
    153 -- appending a '.*subclass*' string to them. For example, the HTML lexer tags unknown tags as
    154 -- `lexer.TAG .. '.unknown'`. This gives editors the opportunity to highlight those subclassed
    155 -- tags in a different way than normal tags, or fall back to highlighting them as normal tags.
    156 --
    157 -- ##### Example Tags
    158 --
    159 -- So, how might you recognize and tag elements like keywords, comments, and strings?  Here are
    160 -- some examples.
    161 --
    162 -- **Keywords**
    163 --
    164 -- Instead of matching *n* keywords with *n* `P('keyword_n')` ordered choices, use one
    165 -- of of the following methods:
    166 --
    167 -- 1. Use the convenience function `lexer.word_match()` optionally coupled with
    168 --    `lexer.set_word_list()`. It is much easier and more efficient to write word matches like:
    169 --
    170 --    ```lua
    171 --    local keyword = lex:tag(lexer.KEYWORD, lex:word_match(lexer.KEYWORD))
    172 --    --[[...]]
    173 --    lex:set_word_list(lexer.KEYWORD, {
    174 --      'keyword_1', 'keyword_2', ..., 'keyword_n'
    175 --    })
    176 --
    177 --    local case_insensitive_word = lex:tag(lexer.KEYWORD, lex:word_match(lexer.KEYWORD, true))
    178 --    --[[...]]
    179 --    lex:set_word_list(lexer.KEYWORD, {
    180 --      'KEYWORD_1', 'keyword_2', ..., 'KEYword_n'
    181 --    })
    182 --
    183 --    local hyphenated_keyword = lex:tag(lexer.KEYWORD, lex:word_match(lexer.KEYWORD))
    184 --    --[[...]]
    185 --    lex:set_word_list(lexer.KEYWORD, {
    186 --      'keyword-1', 'keyword-2', ..., 'keyword-n'
    187 --    })
    188 --    ```
    189 --
    190 --    The benefit of using this method is that other lexers that inherit from, embed, or embed
    191 --    themselves into your lexer can set, replace, or extend these word lists. For example,
    192 --    the TypeScript lexer inherits from JavaScript, but extends JavaScript's keyword and type
    193 --    lists with more options.
    194 --
    195 --    This method also allows applications that use your lexer to extend or replace your word
    196 --    lists. For example, the Lua lexer includes keywords and functions for the latest version
    197 --    of Lua (5.4 at the time of writing). However, editors using that lexer might want to use
    198 --    keywords from Lua version 5.1, which is still quite popular.
    199 --
    200 --    Note that calling `lex:set_word_list()` is completely optional. Your lexer is allowed to
    201 --    expect the editor using it to supply word lists. Scintilla-based editors can do so via
    202 --    Scintilla's `ILexer5` interface.
    203 --
    204 -- 2. Use the lexer-agnostic form of `lexer.word_match()`:
    205 --
    206 --    ```lua
    207 --    local keyword = lex:tag(lexer.KEYWORD, lexer.word_match{
    208 --      'keyword_1', 'keyword_2', ..., 'keyword_n'
    209 --    })
    210 --
    211 --    local case_insensitive_keyword = lex:tag(lexer.KEYWORD, lexer.word_match({
    212 --      'KEYWORD_1', 'keyword_2', ..., 'KEYword_n'
    213 --    }, true))
    214 --
    215 --    local hyphened_keyword = lex:tag(lexer.KEYWORD, lexer.word_match{
    216 --      'keyword-1', 'keyword-2', ..., 'keyword-n'
    217 --    })
    218 --    ```
    219 --
    220 --    For short keyword lists, you can use a single string of words. For example:
    221 --
    222 --    ```lua
    223 --    local keyword = lex:tag(lexer.KEYWORD, lexer.word_match('key_1 key_2 ... key_n'))
    224 --    ```
    225 --
    226 --    You can use this method for static word lists that do not change, or where it does not
    227 --    make sense to allow applications or other lexers to extend or replace a word list.
    228 --
    229 -- **Comments**
    230 --
    231 -- Line-style comments with a prefix character(s) are easy to express:
    232 --
    233 -- ```lua
    234 -- local shell_comment = lex:tag(lexer.COMMENT, lexer.to_eol('#'))
    235 -- local c_line_comment = lex:tag(lexer.COMMENT, lexer.to_eol('//', true))
    236 -- ```
    237 --
    238 -- The comments above start with a '#' or "//" and go to the end of the line (EOL). The second
    239 -- comment recognizes the next line also as a comment if the current line ends with a '\\'
    240 -- escape character.
    241 --
    242 -- C-style "block" comments with a start and end delimiter are also easy to express:
    243 --
    244 -- ```lua
    245 -- local c_comment = lex:tag(lexer.COMMENT, lexer.range('/*', '*/'))
    246 -- ```
    247 --
    248 -- This comment starts with a "/\*" sequence and contains anything up to and including an ending
    249 -- "\*/" sequence. The ending "\*/" is optional so the lexer can recognize unfinished comments
    250 -- as comments and highlight them properly.
    251 --
    252 -- **Strings**
    253 --
    254 -- Most programming languages allow escape sequences in strings such that a sequence like
    255 -- "\\&quot;" in a double-quoted string indicates that the '&quot;' is not the end of the
    256 -- string. `lexer.range()` handles escapes inherently.
    257 --
    258 -- ```lua
    259 -- local dq_str = lexer.range('"')
    260 -- local sq_str = lexer.range("'")
    261 -- local string = lex:tag(lexer.STRING, dq_str + sq_str)
    262 -- ```
    263 --
    264 -- In this case, the lexer treats '\\' as an escape character in a string sequence.
    265 --
    266 -- **Numbers**
    267 --
    268 -- Most programming languages have the same format for integers and floats, so it might be as
    269 -- simple as using a predefined LPeg pattern:
    270 --
    271 -- ```lua
    272 -- local number = lex:tag(lexer.NUMBER, lexer.number)
    273 -- ```
    274 --
    275 -- However, some languages allow postfix characters on integers:
    276 --
    277 -- ```lua
    278 -- local integer = P('-')^-1 * (lexer.dec_num * S('lL')^-1)
    279 -- local number = lex:tag(lexer.NUMBER, lexer.float + lexer.hex_num + integer)
    280 -- ```
    281 --
    282 -- Other languages allow separaters within numbers for better readability:
    283 --
    284 -- ```lua
    285 -- local number = lex:tag(lexer.NUMBER, lexer.number_('_')) -- recognize 1_000_000
    286 -- ```
    287 --
    288 -- Your language may need other tweaks, but it is up to you how fine-grained you want your
    289 -- highlighting to be. After all, you are not writing a compiler or interpreter!
    290 --
    291 -- #### Rules
    292 --
    293 -- Programming languages have grammars, which specify valid syntactic structure. For example,
    294 -- comments usually cannot appear within a string, and valid identifiers (like variable names)
    295 -- cannot be keywords. In Lua lexers, grammars consist of LPeg pattern rules, many of which
    296 -- are tagged.  Recall from the lexer template the `lexer.add_rule()` call, which adds a rule
    297 -- to the lexer's grammar:
    298 --
    299 -- ```lua
    300 -- lex:add_rule('identifier', identifier)
    301 -- ```
    302 --
    303 -- Each rule has an associated name, but rule names are completely arbitrary and serve only to
    304 -- identify and distinguish between different rules. Rule order is important: if text does not
    305 -- match the first rule added to the grammar, the lexer tries to match the second rule added, and
    306 -- so on. Right now this lexer simply matches identifiers under a rule named "identifier".
    307 --
    308 -- To illustrate the importance of rule order, here is an example of a simplified Lua lexer:
    309 --
    310 -- ```lua
    311 -- lex:add_rule('keyword', lex:tag(lexer.KEYWORD, ...))
    312 -- lex:add_rule('identifier', lex:tag(lexer.IDENTIFIER, ...))
    313 -- lex:add_rule('string', lex:tag(lexer.STRING, ...))
    314 -- lex:add_rule('comment', lex:tag(lexer.COMMENT, ...))
    315 -- lex:add_rule('number', lex:tag(lexer.NUMBER, ...))
    316 -- lex:add_rule('label', lex:tag(lexer.LABEL, ...))
    317 -- lex:add_rule('operator', lex:tag(lexer.OPERATOR, ...))
    318 -- ```
    319 --
    320 -- Notice how identifiers come _after_ keywords. In Lua, as with most programming languages,
    321 -- the characters allowed in keywords and identifiers are in the same set (alphanumerics plus
    322 -- underscores). If the lexer added the "identifier" rule before the "keyword" rule, all keywords
    323 -- would match identifiers and thus would be incorrectly tagged (and likewise incorrectly
    324 -- highlighted) as identifiers instead of keywords. The same idea applies to function names,
    325 -- constants, etc. that you may want to distinguish between: their rules should come before
    326 -- identifiers.
    327 --
    328 -- So what about text that does not match any rules? For example in Lua, the '!' character is
    329 -- meaningless outside a string or comment. Normally the lexer skips over such text. If instead
    330 -- you want to highlight these "syntax errors", add a final rule:
    331 --
    332 -- ```lua
    333 -- lex:add_rule('keyword', keyword)
    334 -- --[[...]]
    335 -- lex:add_rule('error', lex:tag(lexer.ERROR, lexer.any))
    336 -- ```
    337 --
    338 -- This identifies and tags any character not matched by an existing rule as a `lexer.ERROR`.
    339 --
    340 -- Even though the rules defined in the examples above contain a single tagged pattern, rules may
    341 -- consist of multiple tagged patterns. For example, the rule for an HTML tag could consist of a
    342 -- tagged tag followed by an arbitrary number of tagged attributes, separated by whitespace. This
    343 -- allows the lexer to produce all tags separately, but in a single, convenient rule. That rule
    344 -- might look something like this:
    345 --
    346 -- ```lua
    347 -- local ws = lex:get_rule('whitespace') -- predefined rule for all lexers
    348 -- lex:add_rule('tag', tag_start * (ws * attributes)^0 * tag_end^-1)
    349 -- ```
    350 --
    351 -- Note however that lexers with complex rules like these are more prone to lose track of their
    352 -- state, especially if they span multiple lines.
    353 --
    354 -- #### Summary
    355 --
    356 -- Lexers primarily consist of tagged patterns and grammar rules. These patterns match language
    357 -- elements like keywords, comments, and strings, and rules dictate the order in which patterns
    358 -- are matched. At your disposal are a number of convenience patterns and functions for rapidly
    359 -- creating a lexer. If you choose to use predefined tag names (or perhaps even subclassed
    360 -- names) for your patterns, you do not have to update your editor's theme to specify how to
    361 -- syntax-highlight those patterns. Your language's elements will inherit the default syntax
    362 -- highlighting color theme your editor uses.
    363 --
    364 -- ### Advanced Techniques
    365 --
    366 -- #### Line Lexers
    367 --
    368 -- By default, lexers match the arbitrary chunks of text passed to them by Scintilla. These
    369 -- chunks may be a full document, only the visible part of a document, or even just portions of
    370 -- lines. Some lexers need to match whole lines. For example, a lexer for the output of a file
    371 -- "diff" needs to know if the line started with a '+' or '-' and then highlight the entire
    372 -- line accordingly. To indicate that your lexer matches by line, create the lexer with an
    373 -- extra parameter:
    374 --
    375 -- ```lua
    376 -- local lex = lexer.new(..., {lex_by_line = true})
    377 -- ```
    378 --
    379 -- Now the input text for the lexer is a single line at a time. Keep in mind that line lexers
    380 -- do not have the ability to look ahead to subsequent lines.
    381 --
    382 -- #### Embedded Lexers
    383 --
    384 -- Scintillua lexers embed within one another very easily, requiring minimal effort. In the
    385 -- following sections, the lexer being embedded is called the "child" lexer and the lexer a child
    386 -- is being embedded in is called the "parent". For example, consider an HTML lexer and a CSS
    387 -- lexer. Either lexer stands alone for tagging their respective HTML and CSS files. However, CSS
    388 -- can be embedded inside HTML. In this specific case, the CSS lexer is the "child" lexer with
    389 -- the HTML lexer being the "parent". Now consider an HTML lexer and a PHP lexer. This sounds
    390 -- a lot like the case with CSS, but there is a subtle difference: PHP _embeds itself into_
    391 -- HTML while CSS is _embedded in_ HTML. This fundamental difference results in two types of
    392 -- embedded lexers: a parent lexer that embeds other child lexers in it (like HTML embedding CSS),
    393 -- and a child lexer that embeds itself into a parent lexer (like PHP embedding itself in HTML).
    394 --
    395 -- ##### Parent Lexer
    396 --
    397 -- Before embedding a child lexer into a parent lexer, the parent lexer needs to load the child
    398 -- lexer. This is done with the `lexer.load()` function. For example, loading the CSS lexer
    399 -- within the HTML lexer looks like:
    400 --
    401 -- ```lua
    402 -- local css = lexer.load('css')
    403 -- ```
    404 --
    405 -- The next part of the embedding process is telling the parent lexer when to switch over
    406 -- to the child lexer and when to switch back. The lexer refers to these indications as the
    407 -- "start rule" and "end rule", respectively, and are just LPeg patterns. Continuing with the
    408 -- HTML/CSS example, the transition from HTML to CSS is when the lexer encounters a "style"
    409 -- tag with a "type" attribute whose value is "text/css":
    410 --
    411 -- ```lua
    412 -- local css_tag = P('<style') * P(function(input, index)
    413 --   if input:find('^[^>]+type="text/css"', index) then return true end
    414 -- end)
    415 -- ```
    416 --
    417 -- This pattern looks for the beginning of a "style" tag and searches its attribute list for
    418 -- the text "`type="text/css"`". (In this simplified example, the Lua pattern does not consider
    419 -- whitespace between the '=' nor does it consider that using single quotes is valid.) If there
    420 -- is a match, the functional pattern returns `true`. However, we ultimately want to tag the
    421 -- "style" tag as an HTML tag, so the actual start rule looks like this:
    422 --
    423 -- ```lua
    424 -- local css_start_rule = #css_tag * tag
    425 -- ```
    426 --
    427 -- Now that the parent knows when to switch to the child, it needs to know when to switch
    428 -- back. In the case of HTML/CSS, the switch back occurs when the lexer encounters an ending
    429 -- "style" tag, though the lexer should still tag that tag as an HTML tag:
    430 --
    431 -- ```lua
    432 -- local css_end_rule = #P('</style>') * tag
    433 -- ```
    434 --
    435 -- Once the parent loads the child lexer and defines the child's start and end rules, it embeds
    436 -- the child with the `lexer.embed()` function:
    437 --
    438 -- ```lua
    439 -- lex:embed(css, css_start_rule, css_end_rule)
    440 -- ```
    441 --
    442 -- ##### Child Lexer
    443 --
    444 -- The process for instructing a child lexer to embed itself into a parent is very similar to
    445 -- embedding a child into a parent: first, load the parent lexer into the child lexer with the
    446 -- `lexer.load()` function and then create start and end rules for the child lexer. However,
    447 -- in this case, call `lexer.embed()` with switched arguments. For example, in the PHP lexer:
    448 --
    449 -- ```lua
    450 -- local html = lexer.load('html')
    451 -- local php_start_rule = lex:tag('php_tag', '<?php' * lexer.space)
    452 -- local php_end_rule = lex:tag('php_tag', '?>')
    453 -- html:embed(lex, php_start_rule, php_end_rule)
    454 -- ```
    455 --
    456 -- Note that the use of a 'php_tag' tag will require the editor using the lexer to specify how
    457 -- to highlight text with that tag. In order to avoid this, you could use the `lexer.PREPROCESSOR`
    458 -- tag instead.
    459 --
    460 -- #### Lexers with Complex State
    461 --
    462 -- A vast majority of lexers are not stateful and can operate on any chunk of text in a
    463 -- document. However, there may be rare cases where a lexer does need to keep track of some
    464 -- sort of persistent state. Rather than using `lpeg.P` function patterns that set state
    465 -- variables, it is recommended to make use of Scintilla's built-in, per-line state integers via
    466 -- `lexer.line_state`. It was designed to accommodate up to 32 bit-flags for tracking state.
    467 -- `lexer.line_from_position()` will return the line for any position given to an `lpeg.P`
    468 -- function pattern. (Any positions derived from that position argument will also work.)
    469 --
    470 -- Writing stateful lexers is beyond the scope of this document.
    471 --
    472 -- ### Code Folding
    473 --
    474 -- When reading source code, it is occasionally helpful to temporarily hide blocks of code like
    475 -- functions, classes, comments, etc. This is the concept of "folding". In the Textadept and
    476 -- SciTE editors for example, little markers in the editor margins appear next to code that
    477 -- can be folded at places called "fold points". When the user clicks on one of those markers,
    478 -- the editor hides the code associated with the marker until the user clicks on the marker
    479 -- again. The lexer specifies these fold points and what code exactly to fold.
    480 --
    481 -- The fold points for most languages occur on keywords or character sequences. Examples of
    482 -- fold keywords are "if" and "end" in Lua and examples of fold character sequences are '{',
    483 -- '}', "/\*", and "\*/" in C for code block and comment delimiters, respectively. However,
    484 -- these fold points cannot occur just anywhere. For example, lexers should not recognize fold
    485 -- keywords that appear within strings or comments. The `lexer.add_fold_point()` function allows
    486 -- you to conveniently define fold points with such granularity. For example, consider C:
    487 --
    488 -- ```lua
    489 -- lex:add_fold_point(lexer.OPERATOR, '{', '}')
    490 -- lex:add_fold_point(lexer.COMMENT, '/*', '*/')
    491 -- ```
    492 --
    493 -- The first assignment states that any '{' or '}' that the lexer tagged as an `lexer.OPERATOR`
    494 -- is a fold point. Likewise, the second assignment states that any "/\*" or "\*/" that the
    495 -- lexer tagged as part of a `lexer.COMMENT` is a fold point. The lexer does not consider any
    496 -- occurrences of these characters outside their tagged elements (such as in a string) as fold
    497 -- points. How do you specify fold keywords? Here is an example for Lua:
    498 --
    499 -- ```lua
    500 -- lex:add_fold_point(lexer.KEYWORD, 'if', 'end')
    501 -- lex:add_fold_point(lexer.KEYWORD, 'do', 'end')
    502 -- lex:add_fold_point(lexer.KEYWORD, 'function', 'end')
    503 -- lex:add_fold_point(lexer.KEYWORD, 'repeat', 'until')
    504 -- ```
    505 --
    506 -- If your lexer has case-insensitive keywords as fold points, simply add a
    507 -- `case_insensitive_fold_points = true` option to `lexer.new()`, and specify keywords in
    508 -- lower case.
    509 --
    510 -- If your lexer needs to do some additional processing in order to determine if a tagged element
    511 -- is a fold point, pass a function to `lex:add_fold_point()` that returns an integer. A return
    512 -- value of `1` indicates the element is a beginning fold point and a return value of `-1`
    513 -- indicates the element is an ending fold point. A return value of `0` indicates the element
    514 -- is not a fold point. For example:
    515 --
    516 -- ```lua
    517 -- local function fold_strange_element(text, pos, line, s, symbol)
    518 --   if ... then
    519 --     return 1 -- beginning fold point
    520 --   elseif ... then
    521 --     return -1 -- ending fold point
    522 --   end
    523 --   return 0
    524 -- end
    525 --
    526 -- lex:add_fold_point('strange_element', '|', fold_strange_element)
    527 -- ```
    528 --
    529 -- Any time the lexer encounters a '|' that is tagged as a "strange_element", it calls the
    530 -- `fold_strange_element` function to determine if '|' is a fold point. The lexer calls these
    531 -- functions with the following arguments: the text to identify fold points in, the beginning
    532 -- position of the current line in the text to fold, the current line's text, the position in
    533 -- the current line the fold point text starts at, and the fold point text itself.
    534 --
    535 -- #### Fold by Indentation
    536 --
    537 -- Some languages have significant whitespace and/or no delimiters that indicate fold points. If
    538 -- your lexer falls into this category and you would like to mark fold points based on changes
    539 -- in indentation, create the lexer with a `fold_by_indentation = true` option:
    540 --
    541 -- ```lua
    542 -- local lex = lexer.new(..., {fold_by_indentation = true})
    543 -- ```
    544 --
    545 -- #### Custom Folding
    546 --
    547 -- Lexers with complex folding needs can implement their own folders by defining their own
    548 -- [`lex:fold()`](#lexer.fold) method. Writing custom folders is beyond the scope of this document.
    549 --
    550 -- ### Using Lexers
    551 --
    552 -- **Textadept**
    553 --
    554 -- Place your lexer in your *~/.textadept/lexers/* directory so you do not overwrite it when
    555 -- upgrading Textadept. Also, lexers in this directory override default lexers. Thus, Textadept
    556 -- loads a user *lua* lexer instead of the default *lua* lexer. This is convenient for tweaking
    557 -- a default lexer to your liking. Then add a [file extension](#lexer.detect_extensions) for
    558 -- your lexer if necessary.
    559 --
    560 -- **SciTE**
    561 --
    562 -- Create a *.properties* file for your lexer and `import` it in either your *SciTEUser.properties*
    563 -- or *SciTEGlobal.properties*. The contents of the *.properties* file should contain:
    564 --
    565 -- 	file.patterns.[lexer_name]=[file_patterns]
    566 -- 	lexer.$(file.patterns.[lexer_name])=scintillua.[lexer_name]
    567 -- 	keywords.$(file.patterns.[lexer_name])=scintillua
    568 -- 	keywords2.$(file.patterns.[lexer_name])=scintillua
    569 -- 	...
    570 -- 	keywords9.$(file.patterns.[lexer_name])=scintillua
    571 --
    572 -- where `[lexer_name]` is the name of your lexer (minus the *.lua* extension) and
    573 -- `[file_patterns]` is a set of file extensions to use your lexer for. The `keyword` settings are
    574 -- only needed if another SciTE properties file has defined keyword sets for `[file_patterns]`.
    575 -- The `scintillua` keyword setting instructs Scintillua to use the keyword sets defined within
    576 -- the lexer. You can override a lexer's keyword set(s) by specifying your own in the same order
    577 -- that the lexer calls `lex:set_word_list()`. For example, the Lua lexer's first set of keywords
    578 -- is for reserved words, the second is for built-in global functions, the third is for library
    579 -- functions, the fourth is for built-in global constants, and the fifth is for library constants.
    580 --
    581 -- SciTE assigns styles to tag names in order to perform syntax highlighting. Since the set of
    582 -- tag names used for a given language changes, your *.properties* file should specify styles
    583 -- for tag names instead of style numbers. For example:
    584 --
    585 -- 	scintillua.styles.my_tag=$(scintillua.styles.keyword),bold
    586 --
    587 -- ### Migrating Legacy Lexers
    588 --
    589 -- Legacy lexers are of the form:
    590 --
    591 -- ```lua
    592 -- local lexer = require('lexer')
    593 -- local token, word_match = lexer.token, lexer.word_match
    594 -- local P, S = lpeg.P, lpeg.S
    595 --
    596 -- local lex = lexer.new('?')
    597 --
    598 -- -- Whitespace.
    599 -- lex:add_rule('whitespace', token(lexer.WHITESPACE, lexer.space^1))
    600 --
    601 -- -- Keywords.
    602 -- lex:add_rule('keyword', token(lexer.KEYWORD, word_match{
    603 --   --[[...]]
    604 -- }))
    605 --
    606 -- --[[... other rule definitions ...]]
    607 --
    608 -- -- Custom.
    609 -- lex:add_rule('custom_rule', token('custom_token', ...))
    610 -- lex:add_style('custom_token', lexer.styles.keyword .. {bold = true})
    611 --
    612 -- -- Fold points.
    613 -- lex:add_fold_point(lexer.OPERATOR, '{', '}')
    614 --
    615 -- return lex
    616 -- ```
    617 --
    618 -- While Scintillua will mostly handle such legacy lexers just fine without any changes, it is
    619 -- recommended that you migrate yours. The migration process is fairly straightforward:
    620 --
    621 -- 1. `lexer` exists in the default lexer environment, so `require('lexer')` should be replaced
    622 --    by simply `lexer`. (Keep in mind `local lexer = lexer` is a Lua idiom.)
    623 -- 2. Every lexer created using `lexer.new()` should no longer specify a lexer name by string,
    624 --    but should instead use `...` (three dots), which evaluates to the lexer's filename or
    625 --    alternative name in embedded lexer applications.
    626 -- 3. Every lexer created using `lexer.new()` now includes a rule to match whitespace. Unless
    627 --    your lexer has significant whitespace, you can remove your legacy lexer's whitespace
    628 --    token and rule. Otherwise, your defined whitespace rule will replace the default one.
    629 -- 4. The concept of tokens has been replaced with tags. Instead of calling a `token()` function,
    630 --    call [`lex:tag()`](#lexer.tag) instead.
    631 -- 5. Lexers now support replaceable word lists. Instead of calling `lexer.word_match()` with
    632 --    large word lists, call it as an instance method with an identifier string (typically
    633 --    something like `lexer.KEYWORD`). Then at the end of the lexer (before `return lex`), call
    634 --    [`lex:set_word_list()`](#lexer.set_word_list) with the same identifier and the usual
    635 --    list of words to match. This allows users of your lexer to call `lex:set_word_list()`
    636 --    with their own set of words should they wish to.
    637 -- 6. Lexers no longer specify styling information. Remove any calls to `lex:add_style()`. You
    638 --    may need to add styling information for custom tags to your editor's theme.
    639 -- 7. `lexer.last_char_includes()` has been deprecated in favor of the new `lexer.after_set()`.
    640 --    Use the character set and pattern as arguments to that new function.
    641 --
    642 -- As an example, consider the following sample legacy lexer:
    643 --
    644 -- ```lua
    645 -- local lexer = require('lexer')
    646 -- local token, word_match = lexer.token, lexer.word_match
    647 -- local P, S = lpeg.P, lpeg.S
    648 --
    649 -- local lex = lexer.new('legacy')
    650 --
    651 -- lex:add_rule('whitespace', token(lexer.WHITESPACE, lexer.space^1))
    652 -- lex:add_rule('keyword', token(lexer.KEYWORD, word_match('foo bar baz')))
    653 -- lex:add_rule('custom', token('custom', 'quux'))
    654 -- lex:add_style('custom', lexer.styles.keyword .. {bold = true})
    655 -- lex:add_rule('identifier', token(lexer.IDENTIFIER, lexer.word))
    656 -- lex:add_rule('string', token(lexer.STRING, lexer.range('"')))
    657 -- lex:add_rule('comment', token(lexer.COMMENT, lexer.to_eol('#')))
    658 -- lex:add_rule('number', token(lexer.NUMBER, lexer.number))
    659 -- lex:add_rule('operator', token(lexer.OPERATOR, S('+-*/%^=<>,.()[]{}')))
    660 --
    661 -- lex:add_fold_point(lexer.OPERATOR, '{', '}')
    662 --
    663 -- return lex
    664 -- ```
    665 --
    666 -- Following the migration steps would yield:
    667 --
    668 -- ```lua
    669 -- local lexer = lexer
    670 -- local P, S = lpeg.P, lpeg.S
    671 --
    672 -- local lex = lexer.new(...)
    673 --
    674 -- lex:add_rule('keyword', lex:tag(lexer.KEYWORD, lex:word_match(lexer.KEYWORD)))
    675 -- lex:add_rule('custom', lex:tag('custom', 'quux'))
    676 -- lex:add_rule('identifier', lex:tag(lexer.IDENTIFIER, lexer.word))
    677 -- lex:add_rule('string', lex:tag(lexer.STRING, lexer.range('"')))
    678 -- lex:add_rule('comment', lex:tag(lexer.COMMENT, lexer.to_eol('#')))
    679 -- lex:add_rule('number', lex:tag(lexer.NUMBER, lexer.number))
    680 -- lex:add_rule('operator', lex:tag(lexer.OPERATOR, S('+-*/%^=<>,.()[]{}')))
    681 --
    682 -- lex:add_fold_point(lexer.OPERATOR, '{', '}')
    683 --
    684 -- lex:set_word_list(lexer.KEYWORD, {'foo', 'bar', 'baz'})
    685 --
    686 -- return lex
    687 -- ```
    688 --
    689 -- Any editors using this lexer would have to add a style for the 'custom' tag.
    690 --
    691 -- ### Considerations
    692 --
    693 -- #### Performance
    694 --
    695 -- There might be some slight overhead when initializing a lexer, but loading a file from disk
    696 -- into Scintilla is usually more expensive. Actually painting the syntax highlighted text to
    697 -- the screen is often more expensive than the lexing operation. On modern computer systems,
    698 -- I see no difference in speed between Lua lexers and Scintilla's C++ ones. Optimize lexers for
    699 -- speed by re-arranging `lexer.add_rule()` calls so that the most common rules match first. Do
    700 -- keep in mind that order matters for similar rules.
    701 --
    702 -- In some cases, folding may be far more expensive than lexing, particularly in lexers with a
    703 -- lot of potential fold points. If your lexer is exhibiting signs of slowness, try disabling
    704 -- folding in your text editor first. If that speeds things up, you can try reducing the number
    705 -- of fold points you added, overriding `lexer.fold()` with your own implementation, or simply
    706 -- eliminating folding support from your lexer.
    707 --
    708 -- #### Limitations
    709 --
    710 -- Embedded preprocessor languages like PHP cannot completely embed themselves into their parent
    711 -- languages because the parent's tagged patterns do not support start and end rules. This
    712 -- mostly goes unnoticed, but code like
    713 --
    714 -- ```php
    715 --     <div id="<?php echo $id; ?>">
    716 -- ```
    717 --
    718 -- will not be tagged correctly. Also, these types of languages cannot currently embed themselves
    719 -- into their parent's child languages either.
    720 --
    721 -- A language cannot embed itself into something like an interpolated string because it is
    722 -- possible that if lexing starts within the embedded entity, it will not be detected as such,
    723 -- so a child to parent transition cannot happen. For example, the following Ruby code will
    724 -- not be tagged correctly:
    725 --
    726 -- ```ruby
    727 --     sum = "1 + 2 = #{1 + 2}"
    728 -- ```
    729 --
    730 -- Also, there is the potential for recursion for languages embedding themselves within themselves.
    731 --
    732 -- #### Troubleshooting
    733 --
    734 -- Errors in lexers can be tricky to debug. Lexers print Lua errors to `io.stderr` and `_G.print()`
    735 -- statements to `io.stdout`. Running your editor from a terminal is the easiest way to see
    736 -- errors as they occur.
    737 --
    738 -- #### Risks
    739 --
    740 -- Poorly written lexers have the ability to crash Scintilla (and thus its containing application),
    741 -- so unsaved data might be lost. However, I have only observed these crashes in early lexer
    742 -- development, when syntax errors or pattern errors are present. Once the lexer actually
    743 -- starts processing and tagging text (either correctly or incorrectly, it does not matter),
    744 -- I have not observed any crashes.
    745 --
    746 -- #### Acknowledgements
    747 --
    748 -- Thanks to Peter Odding for his [lexer post][] on the Lua mailing list that provided inspiration,
    749 -- and thanks to Roberto Ierusalimschy for LPeg.
    750 --
    751 -- [lexer post]: http://lua-users.org/lists/lua-l/2007-04/msg00116.html
    752 -- @module lexer
    753 local M = {}
    754 
    755 --- The tag name for default elements.
    756 -- @field DEFAULT
    757 
    758 --- The tag name for comment elements.
    759 -- @field COMMENT
    760 
    761 --- The tag name for string elements.
    762 -- @field STRING
    763 
    764 --- The tag name for number elements.
    765 -- @field NUMBER
    766 
    767 --- The tag name for keyword elements.
    768 -- @field KEYWORD
    769 
    770 --- The tag name for identifier elements.
    771 -- @field IDENTIFIER
    772 
    773 --- The tag name for operator elements.
    774 -- @field OPERATOR
    775 
    776 --- The tag name for error elements.
    777 -- @field ERROR
    778 
    779 --- The tag name for preprocessor elements.
    780 -- @field PREPROCESSOR
    781 
    782 --- The tag name for constant elements.
    783 -- @field CONSTANT
    784 
    785 --- The tag name for variable elements.
    786 -- @field VARIABLE
    787 
    788 --- The tag name for function elements.
    789 -- @field FUNCTION
    790 
    791 --- The tag name for class elements.
    792 -- @field CLASS
    793 
    794 --- The tag name for type elements.
    795 -- @field TYPE
    796 
    797 --- The tag name for label elements.
    798 -- @field LABEL
    799 
    800 --- The tag name for regex elements.
    801 -- @field REGEX
    802 
    803 --- The tag name for embedded elements.
    804 -- @field EMBEDDED
    805 
    806 --- The tag name for builtin function elements.
    807 -- @field FUNCTION_BUILTIN
    808 
    809 --- The tag name for builtin constant elements.
    810 -- @field CONSTANT_BUILTIN
    811 
    812 --- The tag name for function method elements.
    813 -- @field FUNCTION_METHOD
    814 
    815 --- The tag name for function tag elements, typically in markup.
    816 -- @field TAG
    817 
    818 --- The tag name for function attribute elements, typically in markup.
    819 -- @field ATTRIBUTE
    820 
    821 --- The tag name for builtin variable elements.
    822 -- @field VARIABLE_BUILTIN
    823 
    824 --- The tag name for heading elements, typically in markup.
    825 -- @field HEADING
    826 
    827 --- The tag name for bold elements, typically in markup.
    828 -- @field BOLD
    829 
    830 --- The tag name for builtin italic elements, typically in markup.
    831 -- @field ITALIC
    832 
    833 --- The tag name for underlined elements, typically in markup.
    834 -- @field UNDERLINE
    835 
    836 --- The tag name for code elements, typically in markup.
    837 -- @field CODE
    838 
    839 --- The tag name for link elements, typically in markup.
    840 -- @field LINK
    841 
    842 --- The tag name for reference elements, typically in markup.
    843 -- @field REFERENCE
    844 
    845 --- The tag name for annotation elements.
    846 -- @field ANNOTATION
    847 
    848 --- The tag name for list item elements, typically in markup.
    849 -- @field LIST
    850 
    851 --- The initial (root) fold level.
    852 -- @field FOLD_BASE
    853 
    854 --- Bit-flag indicating that the line is blank.
    855 -- @field FOLD_BLANK
    856 
    857 --- Bit-flag indicating the line is fold point.
    858 -- @field FOLD_HEADER
    859 
    860 -- This comment is needed for LDoc to process the previous field.
    861 
    862 if not lpeg then lpeg = require('lpeg') end -- Scintillua's Lua environment defines _G.lpeg
    863 local lpeg = lpeg
    864 local P, R, S, V, B = lpeg.P, lpeg.R, lpeg.S, lpeg.V, lpeg.B
    865 local Ct, Cc, Cp, Cmt, C = lpeg.Ct, lpeg.Cc, lpeg.Cp, lpeg.Cmt, lpeg.C
    866 
    867 lpeg.setmaxstack(2048) -- the default of 400 is too low for complex grammars
    868 
    869 --- Default tags.
    870 local default = {
    871 	'whitespace', 'comment', 'string', 'number', 'keyword', 'identifier', 'operator', 'error',
    872 	'preprocessor', 'constant', 'variable', 'function', 'class', 'type', 'label', 'regex', 'embedded',
    873 	'function.builtin', 'constant.builtin', 'function.method', 'tag', 'attribute', 'variable.builtin',
    874 	'heading', 'bold', 'italic', 'underline', 'code', 'link', 'reference', 'annotation', 'list'
    875 }
    876 for _, name in ipairs(default) do M[name:upper():gsub('%.', '_')] = name end
    877 --- Names for predefined Scintilla styles.
    878 -- Having these here simplifies style number handling between Scintillua and Scintilla.
    879 local predefined = {
    880 	'default', 'line.number', 'brace.light', 'brace.bad', 'control.char', 'indent.guide', 'call.tip',
    881 	'fold.display.text'
    882 }
    883 for _, name in ipairs(predefined) do M[name:upper():gsub('%.', '_')] = name end
    884 
    885 --- Returns a tagged pattern.
    886 -- @param lexer Lexer to tag the pattern in.
    887 -- @param name String name to use for the tag. If it is not a predefined tag name
    888 --	(`lexer.[A-Z_]+`), its Scintilla style will likely need to be defined by the editor or
    889 --	theme using this lexer.
    890 -- @param patt LPeg pattern to tag.
    891 -- @usage local number = lex:tag(lexer.NUMBER, lexer.number)
    892 -- @usage local addition = lex:tag('addition', '+' * lexer.word)
    893 function M.tag(lexer, name, patt)
    894 	if not lexer._TAGS then
    895 		-- Create the initial maps for tag names to style numbers and styles.
    896 		local tags = {}
    897 		for i, name in ipairs(default) do tags[name], tags[i] = i, name end
    898 		for i, name in ipairs(predefined) do tags[name], tags[i + 32] = i + 32, name end
    899 		lexer._TAGS, lexer._num_styles = tags, #default + 1
    900 		lexer._extra_tags = {}
    901 	end
    902 	if not assert(lexer._TAGS, 'not a lexer instance')[name] then
    903 		local num_styles = lexer._num_styles
    904 		if num_styles == 33 then num_styles = num_styles + 8 end -- skip predefined
    905 		assert(num_styles <= 256, 'too many styles defined (256 MAX)')
    906 		lexer._TAGS[name], lexer._TAGS[num_styles], lexer._num_styles = num_styles, name, num_styles + 1
    907 		lexer._extra_tags[name] = true
    908 		-- If the lexer is a proxy or a child that embedded itself, make this tag name known to
    909 		-- the parent lexer.
    910 		if lexer._lexer then lexer._lexer:tag(name, false) end
    911 	end
    912 	return Cc(name) * (P(patt) / 0) * Cp()
    913 end
    914 
    915 --- Returns a unique grammar rule name for one of the word lists in a lexer.
    916 -- @param lexer Lexer to use.
    917 -- @param i *i*th word list to get.
    918 local function word_list_id(lexer, i) return lexer._name .. '_wordlist' .. i end
    919 
    920 --- Returns a pattern that matches a word in a word list.
    921 -- This is a convenience function for simplifying a set of ordered choice word patterns and
    922 -- potentially allowing downstream users to configure word lists.
    923 -- @param[opt] lexer Lexer to match a word in a word list for. This parameter may be omitted
    924 --   for lexer-agnostic matching.
    925 -- @param word_list Either a string name of the word list to match from if *lexer* is given, or,
    926 --   if *lexer* is omitted, a table of words or a string list of words separated by spaces. If a
    927 --   word list name was given and there is ultimately no word list set via `lex:set_word_list()`,
    928 --   no error will be raised, but the returned pattern will not match anything.
    929 -- @param[opt=false] case_insensitive Match the word case-insensitively.
    930 -- @usage lex:add_rule('keyword', lex:tag(lexer.KEYWORD, lex:word_match(lexer.KEYWORD)))
    931 -- @usage local keyword = lex:tag(lexer.KEYWORD, lexer.word_match{'foo', 'bar', 'baz'})
    932 -- @usage local keyword = lex:tag(lexer.KEYWORD, lexer.word_match({'foo-bar', 'foo-baz',
    933 --   'bar-foo', 'bar-baz', 'baz-foo', 'baz-bar'}, true))
    934 -- @usage local keyword = lex:tag(lexer.KEYWORD, lexer.word_match('foo bar baz'))
    935 function M.word_match(lexer, word_list, case_insensitive)
    936 	if type(lexer) == 'table' and getmetatable(lexer) then
    937 		if lexer._lexer then
    938 			-- If this lexer is a proxy (e.g. rails), get the true parent (ruby) in order to get the
    939 			-- parent's word list. If this lexer is a child embedding itself (e.g. php), continue
    940 			-- getting its word list, not the parent's (html).
    941 			local parent = lexer._lexer
    942 			if not parent._CHILDREN or not parent._CHILDREN[lexer] then lexer = parent end
    943 		end
    944 
    945 		if not lexer._WORDLISTS then lexer._WORDLISTS = {case_insensitive = {}} end
    946 		local i = lexer._WORDLISTS[word_list] or #lexer._WORDLISTS + 1
    947 		lexer._WORDLISTS[word_list], lexer._WORDLISTS[i] = i, '' -- empty placeholder word list
    948 		lexer._WORDLISTS.case_insensitive[i] = case_insensitive
    949 		return V(word_list_id(lexer, i))
    950 	end
    951 
    952 	-- Lexer-agnostic word match.
    953 	word_list, case_insensitive = lexer, word_list
    954 
    955 	if type(word_list) == 'string' then
    956 		local words = word_list -- space-separated list of words
    957 		word_list = {}
    958 		for word in words:gmatch('%S+') do word_list[#word_list + 1] = word end
    959 	end
    960 
    961 	local word_chars = M.alnum + '_'
    962 	local extra_chars = ''
    963 	for _, word in ipairs(word_list) do
    964 		word_list[case_insensitive and word:lower() or word] = true
    965 		for char in word:gmatch('[^%w_%s]') do
    966 			if not extra_chars:find(char, 1, true) then extra_chars = extra_chars .. char end
    967 		end
    968 	end
    969 	if extra_chars ~= '' then word_chars = word_chars + S(extra_chars) end
    970 
    971 	-- Optimize small word sets as ordered choice. "Small" is arbitrary.
    972 	if #word_list <= 6 and not case_insensitive then
    973 		local choice = P(false)
    974 		for _, word in ipairs(word_list) do choice = choice + word:match('%S+') end
    975 		return choice * -word_chars
    976 	end
    977 
    978 	return Cmt(word_chars^1, function(input, index, word)
    979 		if case_insensitive then word = word:lower() end
    980 		return word_list[word]
    981 	end)
    982 end
    983 
    984 --- Sets the words in a lexer's word list.
    985 -- This only has an effect if the lexer uses `lexer.word_match()` to reference the given list.
    986 -- @param lexer Lexer to add a word list to.
    987 -- @param name String name or number of the word list to set.
    988 -- @param word_list Table of words or a string list of words separated by
    989 --   spaces. Case-insensitivity is specified by a `lexer.word_match()` reference to this list.
    990 -- @param[opt=false] append Append *word_list* to an existing word list (if any).
    991 function M.set_word_list(lexer, name, word_list, append)
    992 	if word_list == 'scintillua' then return end -- for SciTE
    993 	if lexer._lexer then
    994 		-- If this lexer is a proxy (e.g. rails), get the true parent (ruby) in order to set the
    995 		-- parent's word list. If this lexer is a child embedding itself (e.g. php), continue
    996 		-- setting its word list, not the parent's (html).
    997 		local parent = lexer._lexer
    998 		if not parent._CHILDREN or not parent._CHILDREN[lexer] then lexer = parent end
    999 	end
   1000 
   1001 	assert(lexer._WORDLISTS, 'lexer has no word lists')
   1002 	local i = tonumber(lexer._WORDLISTS[name]) or name -- lexer._WORDLISTS[name] --> i
   1003 	if type(i) ~= 'number' or i > #lexer._WORDLISTS then return end -- silently return
   1004 
   1005 	if type(word_list) == 'string' then
   1006 		local list = {}
   1007 		for word in word_list:gmatch('%S+') do list[#list + 1] = word end
   1008 		word_list = list
   1009 	end
   1010 
   1011 	if not append or lexer._WORDLISTS[i] == '' then
   1012 		lexer._WORDLISTS[i] = word_list
   1013 	else
   1014 		local list = lexer._WORDLISTS[i]
   1015 		for _, word in ipairs(word_list) do list[#list + 1] = word end
   1016 	end
   1017 
   1018 	lexer._grammar_table = nil -- invalidate
   1019 end
   1020 
   1021 --- Adds a rule to a lexer.
   1022 -- @param lexer Lexer to add *rule* to.
   1023 -- @param id String id associated with this rule. It does not have to be the same as the name
   1024 --   passed to `lex:tag()`.
   1025 -- @param rule LPeg pattern of the rule to add.
   1026 function M.add_rule(lexer, id, rule)
   1027 	if lexer._lexer then lexer = lexer._lexer end -- proxy; get true parent
   1028 	if not lexer._rules then lexer._rules = {} end
   1029 	if id == 'whitespace' and lexer._rules[id] then -- legacy
   1030 		lexer:modify_rule(id, rule)
   1031 		return
   1032 	end
   1033 	lexer._rules[#lexer._rules + 1], lexer._rules[id] = id, rule
   1034 	lexer._grammar_table = nil -- invalidate
   1035 end
   1036 
   1037 --- Replaces a lexer's existing rule.
   1038 -- @param lexer Lexer to modify.
   1039 -- @param id String id of the rule to replace.
   1040 -- @param rule LPeg pattern of the new rule.
   1041 function M.modify_rule(lexer, id, rule)
   1042 	if lexer._lexer then lexer = lexer._lexer end -- proxy; get true parent
   1043 	assert(lexer._rules[id], 'rule does not exist')
   1044 	lexer._rules[id] = rule
   1045 	lexer._grammar_table = nil -- invalidate
   1046 end
   1047 
   1048 --- Returns a unique grammar rule name for one of the rule names in a lexer.
   1049 local function rule_id(lexer, name) return lexer._name .. '.' .. name end
   1050 
   1051 --- Returns a lexer's rule.
   1052 -- @param lexer Lexer to fetch a rule from.
   1053 -- @param id String id of the rule to fetch.
   1054 function M.get_rule(lexer, id)
   1055 	if lexer._lexer then lexer = lexer._lexer end -- proxy; get true parent
   1056 	if id == 'whitespace' then return V(rule_id(lexer, id)) end -- special case
   1057 	return assert(lexer._rules[id], 'rule does not exist')
   1058 end
   1059 
   1060 --- Embeds a child lexer into a parent lexer.
   1061 -- @param lexer Parent lexer.
   1062 -- @param child Child lexer.
   1063 -- @param start_rule LPeg pattern matches the beginning of the child lexer.
   1064 -- @param end_rule LPeg pattern that matches the end of the child lexer.
   1065 -- @usage html:embed(css, css_start_rule, css_end_rule)
   1066 -- @usage html:embed(lex, php_start_rule, php_end_rule) -- from php lexer
   1067 function M.embed(lexer, child, start_rule, end_rule)
   1068 	if lexer._lexer then lexer = lexer._lexer end -- proxy; get true parent
   1069 
   1070 	-- Add child rules.
   1071 	assert(child._rules, 'cannot embed lexer with no rules')
   1072 	if not child._start_rules then child._start_rules = {} end
   1073 	if not child._end_rules then child._end_rules = {} end
   1074 	child._start_rules[lexer], child._end_rules[lexer] = start_rule, end_rule
   1075 	if not lexer._CHILDREN then lexer._CHILDREN = {} end
   1076 	lexer._CHILDREN[#lexer._CHILDREN + 1], lexer._CHILDREN[child] = child, true
   1077 
   1078 	-- Add child tags.
   1079 	for name in pairs(child._extra_tags) do lexer:tag(name, true) end
   1080 
   1081 	-- Add child fold symbols.
   1082 	if child._fold_points then
   1083 		for tag_name, symbols in pairs(child._fold_points) do
   1084 			if tag_name ~= '_symbols' then
   1085 				for symbol, v in pairs(symbols) do lexer:add_fold_point(tag_name, symbol, v) end
   1086 			end
   1087 		end
   1088 	end
   1089 
   1090 	-- Add child word lists.
   1091 	if child._WORDLISTS then
   1092 		for name, i in pairs(child._WORDLISTS) do
   1093 			if type(name) == 'string' and type(i) == 'number' then
   1094 				name = child._name .. '.' .. name
   1095 				lexer:word_match(name) -- for side effects
   1096 				lexer:set_word_list(name, child._WORDLISTS[i])
   1097 			end
   1098 		end
   1099 	end
   1100 
   1101 	child._lexer = lexer -- use parent's rules if child is embedding itself
   1102 end
   1103 
   1104 --- Adds a fold point to a lexer.
   1105 -- @param lexer Lexer to add a fold point to.
   1106 -- @param tag_name String tag name of fold point text.
   1107 -- @param start_symbol String fold point start text.
   1108 -- @param end_symbol Either string fold point end text, or a function that returns whether or
   1109 --   not *start_symbol* is a beginning fold point (1), an ending fold point (-1), or not a fold
   1110 --   point at all (0). If it is a function, it is passed the following arguments:
   1111 --   - `text`: The text being processed for fold points.
   1112 --   - `pos`: The position in *text* of the beginning of the line currently being processed.
   1113 --   - `line`: The text of the line currently being processed.
   1114 --   - `s`: The position of *start_symbol* in *line*.
   1115 --   - `symbol`: *start_symbol* itself.
   1116 -- @usage lex:add_fold_point(lexer.OPERATOR, '{', '}')
   1117 -- @usage lex:add_fold_point(lexer.KEYWORD, 'if', 'end')
   1118 -- @usage lex:add_fold_point('custom', function(text, pos, line, s, symbol) ... end)
   1119 function M.add_fold_point(lexer, tag_name, start_symbol, end_symbol)
   1120 	if not start_symbol and not end_symbol then return end -- from legacy fold_consecutive_lines()
   1121 	if not lexer._fold_points then lexer._fold_points = {_symbols = {}} end
   1122 	local symbols = lexer._fold_points._symbols
   1123 	if not lexer._fold_points[tag_name] then lexer._fold_points[tag_name] = {} end
   1124 	if lexer._case_insensitive_fold_points then
   1125 		start_symbol = start_symbol:lower()
   1126 		if type(end_symbol) == 'string' then end_symbol = end_symbol:lower() end
   1127 	end
   1128 
   1129 	if type(end_symbol) == 'string' then
   1130 		if not symbols[end_symbol] then symbols[#symbols + 1], symbols[end_symbol] = end_symbol, true end
   1131 		lexer._fold_points[tag_name][start_symbol] = 1
   1132 		lexer._fold_points[tag_name][end_symbol] = -1
   1133 	else
   1134 		lexer._fold_points[tag_name][start_symbol] = end_symbol -- function or int
   1135 	end
   1136 	if not symbols[start_symbol] then
   1137 		symbols[#symbols + 1], symbols[start_symbol] = start_symbol, true
   1138 	end
   1139 
   1140 	-- If the lexer is a proxy or a child that embedded itself, copy this fold point to the
   1141 	-- parent lexer.
   1142 	if lexer._lexer then lexer._lexer:add_fold_point(tag_name, start_symbol, end_symbol) end
   1143 end
   1144 
   1145 --- Recursively adds the rules for a lexer and its children to a grammar.
   1146 -- @param g Grammar to add rules to.
   1147 -- @param lexer Lexer whose rules to add.
   1148 local function add_lexer(g, lexer)
   1149 	local rule = P(false)
   1150 
   1151 	-- Add this lexer's rules.
   1152 	for _, name in ipairs(lexer._rules) do
   1153 		local id = rule_id(lexer, name)
   1154 		g[id] = lexer._rules[name] -- ['lua.keyword'] = keyword_patt
   1155 		rule = rule + V(id) -- V('lua.keyword') + V('lua.function') + V('lua.constant') + ...
   1156 	end
   1157 	local any_id = lexer._name .. '_fallback'
   1158 	g[any_id] = lexer:tag(M.DEFAULT, M.any) -- ['lua_fallback'] = any_char
   1159 	rule = rule + V(any_id) -- ... + V('lua.operator') + V('lua_fallback')
   1160 
   1161 	-- Add this lexer's word lists.
   1162 	if lexer._WORDLISTS then
   1163 		for i = 1, #lexer._WORDLISTS do
   1164 			local id = word_list_id(lexer, i)
   1165 			local list, case_insensitive = lexer._WORDLISTS[i], lexer._WORDLISTS.case_insensitive[i]
   1166 			local patt = list ~= '' and M.word_match(list, case_insensitive) or P(false)
   1167 			g[id] = patt -- ['lua_wordlist.1'] = word_match_patt or P(false)
   1168 		end
   1169 	end
   1170 
   1171 	-- Add this child lexer's end rules.
   1172 	if lexer._end_rules then
   1173 		for parent, end_rule in pairs(lexer._end_rules) do
   1174 			local back_id = lexer._name .. '_to_' .. parent._name
   1175 			g[back_id] = end_rule -- ['css_to_html'] = css_end_rule
   1176 			rule = rule - V(back_id) + -- (V('css.property') + ... + V('css_fallback')) - V('css_to_html')
   1177 			V(back_id) * V(parent._name) -- V('css_to_html') * V('html')
   1178 		end
   1179 	end
   1180 
   1181 	-- Add this child lexer's start rules.
   1182 	if lexer._start_rules then
   1183 		for parent, start_rule in pairs(lexer._start_rules) do
   1184 			local to_id = parent._name .. '_to_' .. lexer._name
   1185 			g[to_id] = start_rule * V(lexer._name) -- ['html_to_css'] = css_start_rule * V('css')
   1186 		end
   1187 	end
   1188 
   1189 	-- Finish adding this lexer's rules.
   1190 	local rule_id = lexer._name .. '_rule'
   1191 	g[rule_id] = rule -- ['lua_rule'] = V('lua.keyword') + ... + V('lua_fallback')
   1192 	g[lexer._name] = V(rule_id)^0 -- ['lua'] = V('lua_rule')^0
   1193 
   1194 	-- Add this lexer's children's rules.
   1195 	-- TODO: preprocessor languages like PHP should also embed themselves into their parent's
   1196 	-- children like HTML's CSS and Javascript.
   1197 	if not lexer._CHILDREN then return end
   1198 	for _, child in ipairs(lexer._CHILDREN) do
   1199 		add_lexer(g, child)
   1200 		local to_id = lexer._name .. '_to_' .. child._name
   1201 		g[rule_id] = V(to_id) + g[rule_id] -- ['html_rule'] = V('html_to_css') + V('html.comment') + ...
   1202 
   1203 		-- Add a child's inherited parent's rules (e.g. rhtml parent with rails child inheriting ruby).
   1204 		if child._parent_name then
   1205 			local name = child._name
   1206 			child._name = child._parent_name -- ensure parent and transition rule names are correct
   1207 			add_lexer(g, child)
   1208 			child._name = name -- restore
   1209 			local to_id = lexer._name .. '_to_' .. child._parent_name
   1210 			g[rule_id] = V(to_id) + g[rule_id] -- ['html_rule'] = V('html_to_ruby') + V('html.comment') + ...
   1211 		end
   1212 	end
   1213 end
   1214 
   1215 --- Returns the grammar for a lexer and its initial rule, (re)constructing it if necessary.
   1216 -- @param lexer Lexer to build a grammar for.
   1217 -- @param init_style Current style number. Multiple-language lexers use this to determine which
   1218 --   language to start lexing in.
   1219 local function build_grammar(lexer, init_style)
   1220 	if not lexer._rules then return end
   1221 	if not lexer._initial_rule then lexer._initial_rule = lexer._parent_name or lexer._name end
   1222 	if not lexer._grammar_table then
   1223 		local grammar = {lexer._initial_rule}
   1224 		if not lexer._parent_name then
   1225 			add_lexer(grammar, lexer)
   1226 			-- {'lua',
   1227 			--   ['lua.keyword'] = patt, ['lua.function'] = patt, ...,
   1228 			--   ['lua_wordlist.1'] = patt, ['lua_wordlist.2'] = patt, ...,
   1229 			--   ['lua_rule'] = V('lua.keyword') + ... + V('lua_fallback'),
   1230 			--   ['lua'] = V('lua_rule')^0
   1231 			-- }
   1232 			-- {'html'
   1233 			--   ['html.comment'] = patt, ['html.doctype'] = patt, ...,
   1234 			--   ['html_wordlist.1'] = patt, ['html_wordlist.2'] = patt, ...,
   1235 			--   ['html_rule'] = V('html_to_css') * V('css') + V('html.comment') + ... + V('html_fallback'),
   1236 			--   ['html'] = V('html')^0,
   1237 			--   ['css.property'] = patt, ['css.value'] = patt, ...,
   1238 			--   ['css_wordlist.1'] = patt, ['css_wordlist.2'] = patt, ...,
   1239 			--   ['css_to_html'] = patt,
   1240 			--   ['css_rule'] = ((V('css.property') + ... + V('css_fallback')) - V('css_to_html')) +
   1241 			--     V('css_to_html') * V('html'),
   1242 			--   ['html_to_css'] = patt,
   1243 			--   ['css'] = V('css_rule')^0
   1244 			-- }
   1245 		else
   1246 			local name = lexer._name
   1247 			lexer._name = lexer._parent_name -- ensure parent and transition rule names are correct
   1248 			add_lexer(grammar, lexer)
   1249 			lexer._name = name -- restore
   1250 			-- {'html',
   1251 			--   ...
   1252 			--   ['html_rule'] = V('html_to_php') * V('php') + V('html_to_css') * V('css') +
   1253 			--     V('html.comment') + ... + V('html_fallback'),
   1254 			--   ...
   1255 			--   ['php.keyword'] = patt, ['php.type'] = patt, ...,
   1256 			--   ['php_wordlist.1'] = patt, ['php_wordlist.2'] = patt, ...,
   1257 			--   ['php_to_html'] = patt,
   1258 			--   ['php_rule'] = ((V('php.keyword') + ... + V('php_fallback')) - V('php_to_html')) +
   1259 			--     V('php_to_html') * V('html')
   1260 			--   ['html_to_php'] = patt,
   1261 			--   ['php'] = V('php_rule')^0
   1262 			-- }
   1263 		end
   1264 		lexer._grammar, lexer._grammar_table = Ct(P(grammar)), grammar
   1265 	end
   1266 
   1267 	-- For multilang lexers, build a new grammar whose initial rule is the current language
   1268 	-- if necessary. LPeg does not allow a variable initial rule.
   1269 	if lexer._CHILDREN then
   1270 		for style_num, tag in ipairs(lexer._TAGS) do
   1271 			if style_num == init_style then
   1272 				local lexer_name = tag:match('^whitespace%.(.+)$') or lexer._parent_name or lexer._name
   1273 				if lexer._initial_rule == lexer_name then break end
   1274 				if not lexer._grammar_table[lexer_name] then
   1275 					-- For proxy lexers like RHTML, the 'whitespace.rhtml' tag would produce the 'rhtml'
   1276 					-- lexer name, but there is no 'rhtml' rule. It should be the 'html' rule (parent)
   1277 					-- instead.
   1278 					lexer_name = lexer._parent_name or lexer._name
   1279 				end
   1280 				lexer._initial_rule = lexer_name
   1281 				lexer._grammar_table[1] = lexer._initial_rule
   1282 				lexer._grammar = Ct(P(lexer._grammar_table))
   1283 				return lexer._grammar
   1284 			end
   1285 		end
   1286 	end
   1287 
   1288 	return lexer._grammar
   1289 end
   1290 
   1291 --- Lexes a chunk of text.
   1292 -- @param lexer Lexer to lex text with.
   1293 -- @param text String text to lex, which may be a partial chunk, single line, or full text.
   1294 -- @param init_style Number of the text's current style. Multiple-language lexers use this to
   1295 --   determine which language to start lexing in.
   1296 -- @return table of tag names and positions.
   1297 -- @usage lex:lex(...) --> {'keyword', 2, 'whitespace.lua', 3, 'identifier', 7}
   1298 function M.lex(lexer, text, init_style)
   1299 	local grammar = build_grammar(lexer, init_style)
   1300 	if not grammar then return {M.DEFAULT, #text + 1} end
   1301 	if M._standalone then M._text, M.line_state = text, {} end
   1302 
   1303 	if lexer._lex_by_line then
   1304 		local line_from_position = M.line_from_position
   1305 		local function append(tags, line_tags, offset)
   1306 			for i = 1, #line_tags, 2 do
   1307 				tags[#tags + 1], tags[#tags + 2] = line_tags[i], line_tags[i + 1] + offset
   1308 			end
   1309 		end
   1310 		local tags = {}
   1311 		local offset = 0
   1312 		rawset(M, 'line_from_position', function(pos) return line_from_position(pos + offset) end)
   1313 		for line in text:gmatch('[^\r\n]*\r?\n?') do
   1314 			local line_tags = grammar:match(line)
   1315 			if line_tags then append(tags, line_tags, offset) end
   1316 			offset = offset + #line
   1317 			-- Use the default tag to the end of the line if none was specified.
   1318 			if tags[#tags] ~= offset + 1 then
   1319 				tags[#tags + 1], tags[#tags + 2] = 'default', offset + 1
   1320 			end
   1321 		end
   1322 		rawset(M, 'line_from_position', line_from_position)
   1323 		return tags
   1324 	end
   1325 
   1326 	return grammar:match(text)
   1327 end
   1328 
   1329 --- Determines fold points in a chunk of text.
   1330 -- @param lexer Lexer to fold text with.
   1331 -- @param text String text to fold, which may be a partial chunk, single line, or full text.
   1332 -- @param start_line Line number *text* starts on, counting from 1.
   1333 -- @param start_level Fold level *text* starts with. It cannot be lower than `lexer.FOLD_BASE`
   1334 --   (1024).
   1335 -- @return table of line numbers mapped to fold levels
   1336 -- @usage lex:fold(...) --> {[1] = 1024, [2] = 9216, [3] = 1025, [4] = 1025, [5] = 1024}
   1337 function M.fold(lexer, text, start_line, start_level)
   1338 	if rawget(lexer, 'fold') then return rawget(lexer, 'fold')(lexer, text, start_line, start_level) end
   1339 	local folds = {}
   1340 	if text == '' then return folds end
   1341 	local fold = M.property_int['fold'] > 0
   1342 	local FOLD_BASE, FOLD_HEADER, FOLD_BLANK = M.FOLD_BASE, M.FOLD_HEADER, M.FOLD_BLANK
   1343 	if M._standalone then M._text, M.line_state = text, {} end
   1344 	if fold and lexer._fold_points then
   1345 		local lines = {}
   1346 		for p, l in (text .. '\n'):gmatch('()(.-)\r?\n') do lines[#lines + 1] = {p, l} end
   1347 		local fold_zero_sum_lines = M.property_int['fold.scintillua.on.zero.sum.lines'] > 0
   1348 		local fold_compact = M.property_int['fold.scintillua.compact'] > 0
   1349 		local fold_points = lexer._fold_points
   1350 		local fold_point_symbols = fold_points._symbols
   1351 		local style_at, fold_level = M.style_at, M.fold_level
   1352 		local line_num, prev_level = start_line, start_level
   1353 		local current_level = prev_level
   1354 		for _, captures in ipairs(lines) do
   1355 			local pos, line = captures[1], captures[2]
   1356 			if line ~= '' then
   1357 				if lexer._case_insensitive_fold_points then line = line:lower() end
   1358 				local ranges = {}
   1359 				local function is_valid_range(s, e)
   1360 					if not s or not e then return false end
   1361 					for i = 1, #ranges - 1, 2 do
   1362 						local range_s, range_e = ranges[i], ranges[i + 1]
   1363 						if s >= range_s and s <= range_e or e >= range_s and e <= range_e then
   1364 							return false
   1365 						end
   1366 					end
   1367 					ranges[#ranges + 1] = s
   1368 					ranges[#ranges + 1] = e
   1369 					return true
   1370 				end
   1371 				local level_decreased = false
   1372 				for _, symbol in ipairs(fold_point_symbols) do
   1373 					local word = not symbol:find('[^%w_]')
   1374 					local s, e = line:find(symbol, 1, true)
   1375 					while is_valid_range(s, e) do
   1376 						-- if not word or line:find('^%f[%w_]' .. symbol .. '%f[^%w_]', s) then
   1377 						local word_before = s > 1 and line:find('^[%w_]', s - 1)
   1378 						local word_after = line:find('^[%w_]', e + 1)
   1379 						if not word or not (word_before or word_after) then
   1380 							local style_name = style_at[pos + s - 1]
   1381 							local symbols = fold_points[style_name]
   1382 							if not symbols and style_name:find('%.') then
   1383 								symbols = fold_points[style_name:match('^[^.]+')]
   1384 							end
   1385 							local level = symbols and symbols[symbol]
   1386 							if type(level) == 'function' then
   1387 								level = level(text, pos, line, s, symbol)
   1388 							end
   1389 							if type(level) == 'number' then
   1390 								current_level = current_level + level
   1391 								if level < 0 and current_level < prev_level then
   1392 									-- Potential zero-sum line. If the level were to go back up on the same line,
   1393 									-- the line may be marked as a fold header.
   1394 									level_decreased = true
   1395 								end
   1396 							end
   1397 						end
   1398 						s, e = line:find(symbol, s + 1, true)
   1399 					end
   1400 				end
   1401 				folds[line_num] = prev_level
   1402 				if current_level > prev_level then
   1403 					folds[line_num] = prev_level + FOLD_HEADER
   1404 				elseif level_decreased and current_level == prev_level and fold_zero_sum_lines then
   1405 					if line_num > start_line then
   1406 						folds[line_num] = prev_level - 1 + FOLD_HEADER
   1407 					else
   1408 						-- Typing within a zero-sum line.
   1409 						local level = fold_level[line_num] - 1
   1410 						if level > FOLD_HEADER then level = level - FOLD_HEADER end
   1411 						if level > FOLD_BLANK then level = level - FOLD_BLANK end
   1412 						folds[line_num] = level + FOLD_HEADER
   1413 						current_level = current_level + 1
   1414 					end
   1415 				end
   1416 				if current_level < FOLD_BASE then current_level = FOLD_BASE end
   1417 				prev_level = current_level
   1418 			else
   1419 				folds[line_num] = prev_level + (fold_compact and FOLD_BLANK or 0)
   1420 			end
   1421 			line_num = line_num + 1
   1422 		end
   1423 	elseif fold and
   1424 		(lexer._fold_by_indentation or M.property_int['fold.scintillua.by.indentation'] > 0) then
   1425 		-- Indentation based folding.
   1426 		-- Calculate indentation per line.
   1427 		local indentation = {}
   1428 		for indent, line in (text .. '\n'):gmatch('([\t ]*)([^\r\n]*)\r?\n') do
   1429 			indentation[#indentation + 1] = line ~= '' and #indent
   1430 		end
   1431 		-- Find the first non-blank line before start_line. If the current line is indented, make
   1432 		-- that previous line a header and update the levels of any blank lines inbetween. If the
   1433 		-- current line is blank, match the level of the previous non-blank line.
   1434 		local current_level = start_level
   1435 		for i = start_line, 1, -1 do
   1436 			local level = M.fold_level[i]
   1437 			if level >= FOLD_HEADER then level = level - FOLD_HEADER end
   1438 			if level < FOLD_BLANK then
   1439 				local indent = M.indent_amount[i]
   1440 				if indentation[1] and indentation[1] > indent then
   1441 					folds[i] = FOLD_BASE + indent + FOLD_HEADER
   1442 					for j = i + 1, start_line - 1 do folds[j] = start_level + FOLD_BLANK end
   1443 				elseif not indentation[1] then
   1444 					current_level = FOLD_BASE + indent
   1445 				end
   1446 				break
   1447 			end
   1448 		end
   1449 		-- Iterate over lines, setting fold numbers and fold flags.
   1450 		for i = 1, #indentation do
   1451 			if indentation[i] then
   1452 				current_level = FOLD_BASE + indentation[i]
   1453 				folds[start_line + i - 1] = current_level
   1454 				for j = i + 1, #indentation do
   1455 					if indentation[j] then
   1456 						if FOLD_BASE + indentation[j] > current_level then
   1457 							folds[start_line + i - 1] = current_level + FOLD_HEADER
   1458 							current_level = FOLD_BASE + indentation[j] -- for any blanks below
   1459 						end
   1460 						break
   1461 					end
   1462 				end
   1463 			else
   1464 				folds[start_line + i - 1] = current_level + FOLD_BLANK
   1465 			end
   1466 		end
   1467 	else
   1468 		-- No folding, reset fold levels if necessary.
   1469 		local current_line = start_line
   1470 		for _ in text:gmatch('\r?\n') do
   1471 			folds[current_line] = start_level
   1472 			current_line = current_line + 1
   1473 		end
   1474 	end
   1475 	return folds
   1476 end
   1477 
   1478 --- Creates a new lexer.
   1479 -- @param name String lexer name. Use `...` to inherit from the file's name.
   1480 -- @param[opt] opts Table of lexer options. Options currently supported:
   1481 --   - `lex_by_line`: Only processes whole lines of text at a time (instead of arbitrary chunks
   1482 --     of text). Line lexers cannot look ahead to subsequent lines. The default value is `false`.
   1483 --   - `fold_by_indentation`: Calculate fold points based on changes in line indentation. The
   1484 --     default value is `false`.
   1485 --   - `case_insensitive_fold_points`: Fold points added via `lexer.add_fold_point()` should
   1486 --     ignore case. The default value is `false`.
   1487 --   - `no_user_word_lists`: Do not automatically allocate word lists that can be set by
   1488 --     users. This should really only be set by non-programming languages like markup languages.
   1489 --   - `inherit`: Lexer to inherit from. The default value is `nil`.
   1490 -- @return lexer object
   1491 -- @usage lexer.new(..., {inherit = lexer.load('html')}) -- name is 'rhtml' in rhtml.lua file
   1492 function M.new(name, opts)
   1493 	local lexer = setmetatable({
   1494 		_name = assert(name, 'lexer name expected'), _lex_by_line = opts and opts['lex_by_line'],
   1495 		_fold_by_indentation = opts and opts['fold_by_indentation'],
   1496 		_case_insensitive_fold_points = opts and opts['case_insensitive_fold_points'],
   1497 		_no_user_word_lists = opts and opts['no_user_word_lists'], _lexer = opts and opts['inherit']
   1498 	}, {
   1499 		__index = {
   1500 			tag = M.tag, word_match = M.word_match, set_word_list = M.set_word_list,
   1501 			add_rule = M.add_rule, modify_rule = M.modify_rule, get_rule = M.get_rule,
   1502 			add_fold_point = M.add_fold_point, embed = M.embed, lex = M.lex, fold = M.fold, --
   1503 			add_style = function() end -- legacy
   1504 		}
   1505 	})
   1506 
   1507 	-- Add initial whitespace rule.
   1508 	-- Use a unique whitespace tag name since embedded lexing relies on these unique names.
   1509 	lexer:add_rule('whitespace', lexer:tag('whitespace.' .. name, M.space^1))
   1510 
   1511 	return lexer
   1512 end
   1513 
   1514 --- Creates a substitute for some Scintilla tables, functions, and fields that Scintillua
   1515 -- depends on when using it as a standalone module.
   1516 local function initialize_standalone_library()
   1517 	M.property = setmetatable({['scintillua.lexers'] = package.path:gsub('/%?%.lua', '/lexers')}, {
   1518 		__index = function() return '' end, __newindex = function(t, k, v) rawset(t, k, tostring(v)) end
   1519 	})
   1520 
   1521 	M.line_from_position = function(pos)
   1522 		local line = 1
   1523 		for s in M._text:gmatch('[^\n]*()') do
   1524 			if pos <= s then return line end
   1525 			line = line + 1
   1526 		end
   1527 		return line - 1 -- should not get to here
   1528 	end
   1529 
   1530 	M.text_range = function(pos, length) return M._text:sub(pos, pos + length - 1) end
   1531 
   1532 	--- Returns a line number's start and end positions.
   1533 	-- @param line Line number (1-based) to get the start and end positions of.
   1534 	local function get_line_range(line)
   1535 		local current_line = 1
   1536 		for s, e in M._text:gmatch('()[^\n]*()') do
   1537 			if current_line == line then return s, e end
   1538 			current_line = current_line + 1
   1539 		end
   1540 		return 1, 1 -- should not get to here
   1541 	end
   1542 
   1543 	M.line_start = setmetatable({}, {__index = function(_, line) return get_line_range(line) end})
   1544 	M.line_end = setmetatable({}, {
   1545 		__index = function(_, line) return select(2, get_line_range(line)) end
   1546 	})
   1547 
   1548 	M.indent_amount = setmetatable({}, {
   1549 		__index = function(_, line)
   1550 			local current_line = 1
   1551 			for s in M._text:gmatch('()[^\n]*') do
   1552 				if current_line == line then
   1553 					return #M._text:match('^[ \t]*', s):gsub('\t', string.rep(' ', 8))
   1554 				end
   1555 				current_line = current_line + 1
   1556 			end
   1557 		end
   1558 	})
   1559 
   1560 	M.FOLD_BASE, M.FOLD_HEADER, M.FOLD_BLANK = 0x400, 0x2000, 0x1000
   1561 
   1562 	M._standalone = true
   1563 end
   1564 
   1565 --- Searches for a lexer to load.
   1566 -- This is a safe implementation of Lua 5.2's `package.searchpath()` function that does not
   1567 -- require the package module to be loaded.
   1568 -- @param name String lexer name to search for.
   1569 -- @param path String list of ';'-separated paths to search for lexers in.
   1570 -- @return path to a lexer or `nil` plus an error message
   1571 local function searchpath(name, path)
   1572 	local tried = {}
   1573 	for part in path:gmatch('[^;]+') do
   1574 		local filename = part:gsub('%?', name)
   1575 		local ok, errmsg = loadfile(filename)
   1576 		if ok or not errmsg:find('cannot open') then return filename end
   1577 		tried[#tried + 1] = string.format("no file '%s'", filename)
   1578 	end
   1579 	return nil, table.concat(tried, '\n')
   1580 end
   1581 
   1582 --- Initializes or loads a lexer.
   1583 -- Scintilla calls this function in order to load a lexer. Parent lexers also call this function
   1584 -- in order to load child lexers and vice-versa. The user calls this function in order to load
   1585 -- a lexer when using Scintillua as a Lua library.
   1586 -- @param name String name of the lexing language.
   1587 -- @param[opt] alt_name String alternate name of the lexing language. This is useful for
   1588 --   embedding the same child lexer with multiple sets of start and end tags.
   1589 -- @return lexer object
   1590 function M.load(name, alt_name)
   1591 	assert(name, 'no lexer given')
   1592 	if not M.property then initialize_standalone_library() end
   1593 	if not M.property_int then
   1594 		-- Separate from initialize_standalone_library() so applications that choose to define
   1595 		-- M.property do not also have to define this.
   1596 		M.property_int = setmetatable({}, {
   1597 			__index = function(t, k) return tonumber(M.property[k]) or 0 end,
   1598 			__newindex = function() error('read-only property') end
   1599 		})
   1600 	end
   1601 
   1602 	-- Load the language lexer with its rules, tags, etc.
   1603 	local path = M.property['scintillua.lexers']:gsub(';', '/?.lua;') .. '/?.lua'
   1604 	local ro_lexer = setmetatable({
   1605 		WHITESPACE = 'whitespace.' .. (alt_name or name) -- legacy
   1606 	}, {__index = M})
   1607 	local env = {
   1608 		'assert', 'error', 'ipairs', 'math', 'next', 'pairs', 'print', 'select', 'string', 'table',
   1609 		'tonumber', 'tostring', 'type', 'utf8', '_VERSION', lexer = ro_lexer, lpeg = lpeg, --
   1610 		require = function() return ro_lexer end -- legacy
   1611 	}
   1612 	for _, name in ipairs(env) do env[name] = _G[name] end
   1613 	local lexer = assert(loadfile(assert(searchpath(name, path)), 't', env))(alt_name or name)
   1614 	assert(lexer, string.format("'%s.lua' did not return a lexer", name))
   1615 
   1616 	-- If the lexer is a proxy or a child that embedded itself, set the parent to be the main
   1617 	-- lexer. Keep a reference to the old parent name since embedded child start and end rules
   1618 	-- reference and use that name.
   1619 	if lexer._lexer then
   1620 		lexer = lexer._lexer
   1621 		lexer._parent_name, lexer._name = lexer._name, alt_name or name
   1622 	end
   1623 
   1624 	M.property['scintillua.comment.' .. (alt_name or name)] = M.property['scintillua.comment']
   1625 
   1626 	return lexer
   1627 end
   1628 
   1629 --- Returns a table of all known lexer names.
   1630 -- This function is not available to lexers and requires the LuaFileSystem (`lfs`) module to
   1631 -- be available.
   1632 -- @param[opt] path String list of ';'-separated directories to search for lexers in. The
   1633 --   default value is Scintillua's configured lexer path.
   1634 function M.names(path)
   1635 	local lfs = require('lfs')
   1636 	if not path then path = M.property and M.property['scintillua.lexers'] end
   1637 	if not path or path == '' then
   1638 		for part in package.path:gmatch('[^;]+') do
   1639 			local dir = part:match('^(.-[/\\]?lexers)[/\\]%?%.lua$')
   1640 			if dir then
   1641 				path = dir
   1642 				break
   1643 			end
   1644 		end
   1645 	end
   1646 	local lexers = {}
   1647 	for dir in assert(path, 'lexer path not configured or found'):gmatch('[^;]+') do
   1648 		if lfs.attributes(dir, 'mode') == 'directory' then
   1649 			for file in lfs.dir(dir) do
   1650 				local name = file:match('^(.+)%.lua$')
   1651 				if name and name ~= 'lexer' and not lexers[name] then
   1652 					lexers[#lexers + 1], lexers[name] = name, true
   1653 				end
   1654 			end
   1655 		end
   1656 	end
   1657 	table.sort(lexers)
   1658 	return lexers
   1659 end
   1660 
   1661 --- Map of file extensions, without the '.' prefix, to their associated lexer names.
   1662 -- @usage lexer.detect_extensions.luadoc = 'lua'
   1663 M.detect_extensions = {}
   1664 
   1665 --- Map of first-line patterns to their associated lexer names.
   1666 -- These are Lua string patterns, not LPeg patterns.
   1667 -- @usage lexer.detect_patterns['^#!.+/zsh'] = 'bash'
   1668 M.detect_patterns = {}
   1669 
   1670 --- Returns the name of the lexer often associated a particular filename and/or file content.
   1671 -- @param[opt] filename String filename to inspect. The default value is read from the
   1672 --   "lexer.scintillua.filename" property.
   1673 -- @param[optchain] line String first content line, such as a shebang line. The default value
   1674 --   is read from the "lexer.scintillua.line" property.
   1675 -- @return string lexer name to pass to `lexer.load()`, or `nil` if none was detected
   1676 function M.detect(filename, line)
   1677 	if not filename then filename = M.property and M.property['lexer.scintillua.filename'] or '' end
   1678 	if not line then line = M.property and M.property['lexer.scintillua.line'] or '' end
   1679 
   1680 	-- Locally scoped in order to avoid persistence in memory.
   1681 	local extensions = {
   1682 		as = 'actionscript', asc = 'actionscript', --
   1683 		adb = 'ada', ads = 'ada', --
   1684 		g = 'antlr', g4 = 'antlr', --
   1685 		ans = 'apdl', inp = 'apdl', mac = 'apdl', --
   1686 		apl = 'apl', --
   1687 		applescript = 'applescript', --
   1688 		asm = 'asm', ASM = 'asm', s = 'asm', S = 'asm', --
   1689 		asa = 'asp', asp = 'asp', hta = 'asp', --
   1690 		ahk = 'autohotkey', --
   1691 		au3 = 'autoit', a3x = 'autoit', --
   1692 		awk = 'awk', --
   1693 		bat = 'batch', cmd = 'batch', --
   1694 		bib = 'bibtex', --
   1695 		boo = 'boo', --
   1696 		cs = 'csharp', --
   1697 		c = 'c', C = 'c', cc = 'cpp', cpp = 'cpp', cxx = 'cpp', ['c++'] = 'cpp', h = 'cpp', hh = 'cpp',
   1698 		hpp = 'cpp', hxx = 'cpp', ['h++'] = 'cpp', --
   1699 		ck = 'chuck', --
   1700 		clj = 'clojure', cljs = 'clojure', cljc = 'clojure', edn = 'clojure', --
   1701 		['CMakeLists.txt'] = 'cmake', cmake = 'cmake', ['cmake.in'] = 'cmake', ctest = 'cmake',
   1702 		['ctest.in'] = 'cmake', --
   1703 		coffee = 'coffeescript', --
   1704 		cr = 'crystal', --
   1705 		css = 'css', --
   1706 		cu = 'cuda', cuh = 'cuda', --
   1707 		d = 'd', di = 'd', --
   1708 		dart = 'dart', --
   1709 		desktop = 'desktop', --
   1710 		diff = 'diff', patch = 'diff', --
   1711 		Dockerfile = 'dockerfile', --
   1712 		dot = 'dot', --
   1713 		e = 'eiffel', eif = 'eiffel', --
   1714 		ex = 'elixir', exs = 'elixir', --
   1715 		elm = 'elm', --
   1716 		erl = 'erlang', hrl = 'erlang', --
   1717 		fs = 'fsharp', --
   1718 		factor = 'factor', --
   1719 		fan = 'fantom', --
   1720 		dsp = 'faust', --
   1721 		fnl = 'fennel', --
   1722 		fish = 'fish', --
   1723 		forth = 'forth', frt = 'forth', --
   1724 		f = 'fortran', ['for'] = 'fortran', ftn = 'fortran', fpp = 'fortran', f77 = 'fortran',
   1725 		f90 = 'fortran', f95 = 'fortran', f03 = 'fortran', f08 = 'fortran', --
   1726 		fstab = 'fstab', --
   1727 		gd = 'gap', gi = 'gap', gap = 'gap', --
   1728 		gmi = 'gemini', --
   1729 		po = 'gettext', pot = 'gettext', --
   1730 		feature = 'gherkin', --
   1731 		gleam = 'gleam', --
   1732 		glslf = 'glsl', glslv = 'glsl', --
   1733 		dem = 'gnuplot', plt = 'gnuplot', --
   1734 		go = 'go', --
   1735 		groovy = 'groovy', gvy = 'groovy', --
   1736 		gtkrc = 'gtkrc', --
   1737 		ha = 'hare', --
   1738 		hs = 'haskell', --
   1739 		htm = 'html', html = 'html', shtm = 'html', shtml = 'html', xhtml = 'html', vue = 'html', --
   1740 		icn = 'icon', --
   1741 		idl = 'idl', odl = 'idl', --
   1742 		ni = 'inform', --
   1743 		cfg = 'ini', cnf = 'ini', inf = 'ini', ini = 'ini', reg = 'ini', --
   1744 		io = 'io_lang', --
   1745 		bsh = 'java', java = 'java', --
   1746 		js = 'javascript', jsfl = 'javascript', --
   1747 		jq = 'jq', --
   1748 		json = 'json', --
   1749 		jsp = 'jsp', --
   1750 		jl = 'julia', --
   1751 		bbl = 'latex', dtx = 'latex', ins = 'latex', ltx = 'latex', tex = 'latex', sty = 'latex', --
   1752 		ledger = 'ledger', journal = 'ledger', --
   1753 		less = 'less', --
   1754 		lily = 'lilypond', ly = 'lilypond', --
   1755 		cl = 'lisp', el = 'lisp', lisp = 'lisp', lsp = 'lisp', --
   1756 		litcoffee = 'litcoffee', --
   1757 		lgt = 'logtalk', --
   1758 		lua = 'lua', --
   1759 		GNUmakefile = 'makefile', iface = 'makefile', mak = 'makefile', makefile = 'makefile',
   1760 		Makefile = 'makefile', --
   1761 		md = 'markdown', markdown = 'markdown', --
   1762 		['meson.build'] = 'meson', --
   1763 		moon = 'moonscript', --
   1764 		myr = 'myrddin', --
   1765 		n = 'nemerle', --
   1766 		link = 'networkd', network = 'networkd', netdev = 'networkd', --
   1767 		nim = 'nim', --
   1768 		nix = 'nix', --
   1769 		nsh = 'nsis', nsi = 'nsis', nsis = 'nsis', --
   1770 		obs = 'objeck', --
   1771 		m = 'objective_c', mm = 'objective_c', objc = 'objective_c', --
   1772 		caml = 'caml', ml = 'caml', mli = 'caml', mll = 'caml', mly = 'caml', --
   1773 		org = 'org', --
   1774 		dpk = 'pascal', dpr = 'pascal', p = 'pascal', pas = 'pascal', --
   1775 		al = 'perl', perl = 'perl', pl = 'perl', pm = 'perl', pod = 'perl', --
   1776 		inc = 'php', php = 'php', php3 = 'php', php4 = 'php', phtml = 'php', --
   1777 		p8 = 'pico8', --
   1778 		pike = 'pike', pmod = 'pike', --
   1779 		PKGBUILD = 'pkgbuild', --
   1780 		pony = 'pony', --
   1781 		eps = 'ps', ps = 'ps', --
   1782 		ps1 = 'powershell', --
   1783 		prolog = 'prolog', --
   1784 		props = 'props', properties = 'props', --
   1785 		proto = 'protobuf', --
   1786 		pure = 'pure', --
   1787 		sc = 'python', py = 'python', pyw = 'python', --
   1788 		R = 'r', Rout = 'r', Rhistory = 'r', Rt = 'r', ['Rout.save'] = 'r', ['Rout.fail'] = 'r', --
   1789 		re = 'reason', --
   1790 		r = 'rebol', reb = 'rebol', --
   1791 		rst = 'rest', --
   1792 		orx = 'rexx', rex = 'rexx', --
   1793 		erb = 'rhtml', rhtml = 'rhtml', --
   1794 		rsc = 'routeros', --
   1795 		spec = 'rpmspec', --
   1796 		Rakefile = 'ruby', rake = 'ruby', rb = 'ruby', rbw = 'ruby', --
   1797 		rs = 'rust', --
   1798 		sass = 'sass', scss = 'sass', --
   1799 		scala = 'scala', --
   1800 		sch = 'scheme', scm = 'scheme', --
   1801 		bash = 'bash', bashrc = 'bash', bash_profile = 'bash', configure = 'bash', csh = 'bash',
   1802 		ksh = 'bash', mksh = 'bash', sh = 'bash', zsh = 'bash', --
   1803 		changes = 'smalltalk', st = 'smalltalk', sources = 'smalltalk', --
   1804 		sml = 'sml', fun = 'sml', sig = 'sml', --
   1805 		sno = 'snobol4', SNO = 'snobol4', --
   1806 		spin = 'spin', --
   1807 		ddl = 'sql', sql = 'sql', --
   1808 		automount = 'systemd', device = 'systemd', mount = 'systemd', path = 'systemd',
   1809 		scope = 'systemd', service = 'systemd', slice = 'systemd', socket = 'systemd', swap = 'systemd',
   1810 		target = 'systemd', timer = 'systemd', --
   1811 		taskpaper = 'taskpaper', --
   1812 		tcl = 'tcl', tk = 'tcl', --
   1813 		texi = 'texinfo', --
   1814 		toml = 'toml', --
   1815 		['1'] = 'troff', ['2'] = 'troff', ['3'] = 'troff', ['4'] = 'troff', ['5'] = 'troff',
   1816 		['6'] = 'troff', ['7'] = 'troff', ['8'] = 'troff', ['9'] = 'troff', ['1x'] = 'troff',
   1817 		['2x'] = 'troff', ['3x'] = 'troff', ['4x'] = 'troff', ['5x'] = 'troff', ['6x'] = 'troff',
   1818 		['7x'] = 'troff', ['8x'] = 'troff', ['9x'] = 'troff', --
   1819 		t2t = 'txt2tags', --
   1820 		ts = 'typescript', --
   1821 		vala = 'vala', --
   1822 		vcf = 'vcard', vcard = 'vcard', --
   1823 		v = 'verilog', ver = 'verilog', --
   1824 		vh = 'vhdl', vhd = 'vhdl', vhdl = 'vhdl', --
   1825 		bas = 'vb', cls = 'vb', ctl = 'vb', dob = 'vb', dsm = 'vb', dsr = 'vb', frm = 'vb', pag = 'vb',
   1826 		vb = 'vb', vba = 'vb', vbs = 'vb', --
   1827 		wsf = 'wsf', --
   1828 		dtd = 'xml', svg = 'xml', xml = 'xml', xsd = 'xml', xsl = 'xml', xslt = 'xml', xul = 'xml', --
   1829 		xs = 'xs', xsin = 'xs', xsrc = 'xs', --
   1830 		xtend = 'xtend', --
   1831 		yaml = 'yaml', yml = 'yaml', --
   1832 		zig = 'zig'
   1833 	}
   1834 	local patterns = {
   1835 		['^#!.+[/ ][gm]?awk'] = 'awk', ['^#!.+[/ ]lua'] = 'lua', ['^#!.+[/ ]octave'] = 'matlab',
   1836 		['^#!.+[/ ]perl'] = 'perl', ['^#!.+[/ ]php'] = 'php', ['^#!.+[/ ]python'] = 'python',
   1837 		['^#!.+[/ ]ruby'] = 'ruby', ['^#!.+[/ ]bash'] = 'bash', ['^#!.+/m?ksh'] = 'bash',
   1838 		['^#!.+/sh'] = 'bash', ['^%s*class%s+%S+%s*<%s*ApplicationController'] = 'rails',
   1839 		['^%s*class%s+%S+%s*<%s*ActionController::Base'] = 'rails',
   1840 		['^%s*class%s+%S+%s*<%s*ActiveRecord::Base'] = 'rails',
   1841 		['^%s*class%s+%S+%s*<%s*ActiveRecord::Migration'] = 'rails', ['^%s*<%?xml%s'] = 'xml',
   1842 		['^#cloud%-config'] = 'yaml'
   1843 	}
   1844 
   1845 	for patt, name in pairs(M.detect_patterns) do if line:find(patt) then return name end end
   1846 	for patt, name in pairs(patterns) do if line:find(patt) then return name end end
   1847 	local name, ext = filename:match('[^/\\]+$'), filename:match('[^.]*$')
   1848 	return M.detect_extensions[name] or extensions[name] or M.detect_extensions[ext] or
   1849 		extensions[ext]
   1850 end
   1851 
   1852 -- The following are utility functions lexers will have access to.
   1853 
   1854 -- Common patterns.
   1855 
   1856 --- A pattern that matches any single character.
   1857 M.any = P(1)
   1858 --- A pattern that matches any alphabetic character ('A'-'Z', 'a'-'z').
   1859 M.alpha = R('AZ', 'az')
   1860 --- A pattern that matches any digit ('0'-'9').
   1861 M.digit = R('09')
   1862 --- A pattern that matches any alphanumeric character ('A'-'Z', 'a'-'z', '0'-'9').
   1863 M.alnum = R('AZ', 'az', '09')
   1864 --- A pattern that matches any lower case character ('a'-'z').
   1865 M.lower = R('az')
   1866 --- A pattern that matches any upper case character ('A'-'Z').
   1867 M.upper = R('AZ')
   1868 --- A pattern that matches any hexadecimal digit ('0'-'9', 'A'-'F', 'a'-'f').
   1869 M.xdigit = R('09', 'AF', 'af')
   1870 --- A pattern that matches any graphical character ('!' to '~').
   1871 M.graph = R('!~')
   1872 --- A pattern that matches any punctuation character ('!' to '/', ':' to '@', '[' to ''', '{'
   1873 -- to '~').
   1874 M.punct = R('!/', ':@', '[\'', '{~')
   1875 --- A pattern that matches any whitespace character ('\t', '\v', '\f', '\n', '\r', space).
   1876 M.space = S('\t\v\f\n\r ')
   1877 
   1878 --- A pattern that matches an end of line, either CR+LF or LF.
   1879 M.newline = P('\r')^-1 * '\n'
   1880 --- A pattern that matches any single, non-newline character.
   1881 M.nonnewline = 1 - M.newline
   1882 
   1883 --- Returns a pattern that matches a decimal number, whose digits may be separated by a particular
   1884 -- character.
   1885 -- @param c Digit separator character.
   1886 function M.dec_num_(c) return M.digit * (P(c)^-1 * M.digit)^0 end
   1887 --- Returns a pattern that matches a hexadecimal number, whose digits may be separated by
   1888 -- a particular character.
   1889 -- @param c Digit separator character.
   1890 function M.hex_num_(c) return '0' * S('xX') * (P(c)^-1 * M.xdigit)^1 end
   1891 --- Returns a pattern that matches an octal number, whose digits may be separated by a particular
   1892 -- character.
   1893 -- @param c Digit separator character.
   1894 function M.oct_num_(c) return '0' * (P(c)^-1 * R('07'))^1 * -M.xdigit end
   1895 --- Returns a pattern that matches a binary number, whose digits may be separated by a particular
   1896 -- character.
   1897 -- @param c Digit separator character.
   1898 function M.bin_num_(c) return '0' * S('bB') * (P(c)^-1 * S('01'))^1 * -M.xdigit end
   1899 --- Returns a pattern that matches either a decimal, hexadecimal, octal, or binary number,
   1900 -- whose digits may be separated by a particular character.
   1901 -- @param c Digit separator character.
   1902 function M.integer_(c)
   1903 	return S('+-')^-1 * (M.hex_num_(c) + M.bin_num_(c) + M.oct_num_(c) + M.dec_num_(c))
   1904 end
   1905 local function exp_(c) return S('eE') * S('+-')^-1 * M.digit * (P(c)^-1 * M.digit)^0 end
   1906 --- Returns a pattern that matches a floating point number, whose digits may be separated by a
   1907 -- particular character.
   1908 -- @param c Digit separator character.
   1909 function M.float_(c)
   1910 	return S('+-')^-1 *
   1911 		((M.dec_num_(c)^-1 * '.' * M.dec_num_(c) + M.dec_num_(c) * '.' * M.dec_num_(c)^-1 * -P('.')) *
   1912 			exp_(c)^-1 + (M.dec_num_(c) * exp_(c)))
   1913 end
   1914 --- Returns a pattern that matches a typical number, either a floating point, decimal, hexadecimal,
   1915 -- octal, or binary number, and whose digits may be separated by a particular character.
   1916 -- @param c Digit separator character.
   1917 -- @usage lexer.number_('_') -- matches 1_000_000
   1918 function M.number_(c) return M.float_(c) + M.integer_(c) end
   1919 
   1920 --- A pattern that matches a decimal number.
   1921 M.dec_num = M.dec_num_(false)
   1922 --- A pattern that matches a hexadecimal number.
   1923 M.hex_num = M.hex_num_(false)
   1924 --- A pattern that matches an octal number.
   1925 M.oct_num = M.oct_num_(false)
   1926 --- A pattern that matches a binary number.
   1927 M.bin_num = M.bin_num_(false)
   1928 --- A pattern that matches either a decimal, hexadecimal, octal, or binary number.
   1929 M.integer = M.integer_(false)
   1930 --- A pattern that matches a floating point number.
   1931 M.float = M.float_(false)
   1932 --- A pattern that matches a typical number, either a floating point, decimal, hexadecimal,
   1933 -- octal, or binary number.
   1934 M.number = M.number_(false)
   1935 
   1936 --- A pattern that matches a typical word. Words begin with a letter or underscore and consist
   1937 -- of alphanumeric and underscore characters.
   1938 M.word = (M.alpha + '_') * (M.alnum + '_')^0
   1939 
   1940 --- Returns a pattern that matches a prefix until the end of its line.
   1941 -- @param[opt] prefix String or pattern prefix to start matching at. The default value is any
   1942 --   non-newline character.
   1943 -- @param[optchain=false] escape Allow newline escapes using a '\\' character.
   1944 -- @usage local line_comment = lexer.to_eol('//')
   1945 -- @usage local line_comment = lexer.to_eol(S('#;'))
   1946 function M.to_eol(prefix, escape)
   1947 	return (prefix or M.nonnewline) *
   1948 		(not escape and M.nonnewline or 1 - (M.newline + '\\') + '\\' * M.any)^0
   1949 end
   1950 
   1951 --- Returns a pattern that matches a bounded range of text.
   1952 -- This is a convenience function for matching more complicated ranges like strings with escape
   1953 -- characters, balanced parentheses, and block comments (nested or not).
   1954 -- @param s String or LPeg pattern start of the range.
   1955 -- @param[opt=s] e String or LPeg pattern end of the range. The default value is *s*.
   1956 -- @param[optchain=false] single_line Restrict the range to a single line.
   1957 -- @param[optchain] escapes Allow the range end to be escaped by a '\\' character. The default
   1958 --   value is `false` unless *s* and *e* are identical, single-character strings. In that case,
   1959 --   the default value is `true`.
   1960 -- @param[optchain=false] balanced Match a balanced range, like the "%b" Lua pattern. This flag
   1961 --   only applies if *s* and *e* are different.
   1962 -- @usage local dq_str_escapes = lexer.range('"')
   1963 -- @usage local dq_str_noescapes = lexer.range('"', false, false)
   1964 -- @usage local unbalanced_parens = lexer.range('(', ')')
   1965 -- @usage local balanced_parens = lexer.range('(', ')', false, false, true)
   1966 function M.range(s, e, single_line, escapes, balanced)
   1967 	if type(e) ~= 'string' and type(e) ~= 'userdata' then
   1968 		e, single_line, escapes, balanced = s, e, single_line, escapes
   1969 	end
   1970 	local any = M.any - e
   1971 	if single_line then any = any - '\n' end
   1972 	if balanced then any = any - s end
   1973 	-- Only allow escapes by default for ranges with identical, single-character string delimiters.
   1974 	if escapes == nil then escapes = type(s) == 'string' and #s == 1 and s == e end
   1975 	if escapes then any = any - '\\' + '\\' * M.any end
   1976 	if balanced and s ~= e then return P{s * (any + V(1))^0 * P(e)^-1} end
   1977 	return s * any^0 * P(e)^-1
   1978 end
   1979 
   1980 --- Returns a pattern that only matches when it comes after certain characters (or when there
   1981 -- are no characters behind it).
   1982 -- @param set String character set like one passed to `lpeg.S()`.
   1983 -- @param patt LPeg pattern to match after a character in *set*.
   1984 -- @param skip String character set to skip over when looking backwards from *patt*. The default
   1985 --   value is " \t\r\n\v\f" (whitespace).
   1986 -- @usage local regex = lexer.after_set('+-*!%^&|=,([{', lexer.range('/'))
   1987 --   -- matches "var re = /foo/;", but not "var x = 1 / 2 / 3;"
   1988 function M.after_set(set, patt, skip)
   1989 	if not skip then skip = ' \t\r\n\v\f' end
   1990 	local set_chars, skip_chars = {}, {}
   1991 	-- Note: cannot use utf8.codes() because Lua 5.1 is still supported.
   1992 	for char in set:gmatch('.') do set_chars[string.byte(char)] = true end
   1993 	for char in skip:gmatch('.') do skip_chars[string.byte(char)] = true end
   1994 	return (B(S(set)) + -B(1)) * patt + Cmt(C(patt), function(input, index, match, ...)
   1995 		local pos = index - #match
   1996 		if #skip > 0 then while pos > 1 and skip_chars[input:byte(pos - 1)] do pos = pos - 1 end end
   1997 		if pos == 1 or set_chars[input:byte(pos - 1)] then return index, ... end
   1998 		return nil
   1999 	end)
   2000 end
   2001 
   2002 --- Returns a pattern that matches only at the beginning of a line.
   2003 -- @param patt LPeg pattern to match at the beginning of a line.
   2004 -- @param[opt=false] allow_indent Allow *patt* to match after line indentation.
   2005 -- @usage local preproc = lex:tag(lexer.PREPROCESSOR, lexer.starts_line(lexer.to_eol('#')))
   2006 function M.starts_line(patt, allow_indent)
   2007 	return M.after_set('\r\n\v\f', patt, allow_indent and ' \t' or '')
   2008 end
   2009 
   2010 M.colors = {} -- legacy
   2011 M.styles = setmetatable({}, { -- legacy
   2012 	__index = function() return setmetatable({}, {__concat = function() return nil end}) end,
   2013 	__newindex = function() end
   2014 })
   2015 M.property_expanded = setmetatable({}, {__index = function() return '' end}) -- legacy
   2016 
   2017 -- Legacy function for creates and returns a token pattern with token name *name* and pattern
   2018 -- *patt*.
   2019 -- Use `tag()` instead.
   2020 -- @param name The name of token.
   2021 -- @param patt The LPeg pattern associated with the token.
   2022 -- @usage local number = token(lexer.NUMBER, lexer.number)
   2023 -- @usage local addition = token('addition', '+' * lexer.word)
   2024 function M.token(name, patt) return Cc(name) * (P(patt) / 0) * Cp() end
   2025 
   2026 -- Legacy function that creates and returns a pattern that verifies the first non-whitespace
   2027 -- character behind the current match position is in string set *s*.
   2028 -- @param s String character set like one passed to `lpeg.S()`.
   2029 -- @usage local regex = #P('/') * lexer.last_char_includes('+-*!%^&|=,([{') * lexer.range('/')
   2030 function M.last_char_includes(s) return M.after_set(s, true) end
   2031 
   2032 function M.fold_consecutive_lines() end -- legacy
   2033 
   2034 -- The functions and fields below were defined in C.
   2035 
   2036 --- Map of line numbers (starting from 1) to their fold level bit-masks. (Read-only)
   2037 -- Fold level masks are composed of an integer level combined with any of the following bits:
   2038 --
   2039 --   - `lexer.FOLD_BASE`
   2040 --     The initial fold level (1024).
   2041 --   - `lexer.FOLD_BLANK`
   2042 --     The line is blank.
   2043 --   - `lexer.FOLD_HEADER`
   2044 --     The line is a header, or fold point.
   2045 -- @table fold_level
   2046 
   2047 --- Map of line numbers (starting from 1) to their indentation amounts, measured in character
   2048 -- columns. (Read-only)
   2049 -- @table indent_amount
   2050 
   2051 --- Map of line numbers (starting from 1) to their 32-bit integer line states.
   2052 -- Line states can be used by lexers for keeping track of persistent states (up to 32 states
   2053 -- with 1 state per bit). For example, the output lexer uses this to mark lines that have
   2054 -- warnings or errors.
   2055 -- @table line_state
   2056 
   2057 --- Map of key-value string pairs.
   2058 -- The contents of this map are application-dependant.
   2059 -- @table property
   2060 
   2061 --- Alias of `lexer.property`, but with values interpreted as numbers, or `0` if not
   2062 -- found. (Read-only)
   2063 -- @table property_int
   2064 
   2065 --- Map of buffer positions (starting from 1) to their string style names. (Read-only)
   2066 -- @table style_at
   2067 
   2068 --- Returns a position's line number (starting from 1).
   2069 -- @param pos Position (starting from 1) to get the line number of.
   2070 -- @function line_from_position
   2071 
   2072 --- Map of line numbers (starting from 1) to their start positions. (Read-only)
   2073 -- @table line_start
   2074 
   2075 --- Map of line numbers (starting from 1) to their end positions. (Read-only)
   2076 -- @table line_end
   2077 
   2078 --- Returns a range of buffer text.
   2079 -- The current text being lexed or folded may be a subset of buffer text. This function can
   2080 -- return any text in the buffer.
   2081 -- @param pos Position (starting from 1) of the text range to get. It needs to be an absolute
   2082 --	position. Use a combination of `lexer.line_from_position()` and `lexer.line_start`
   2083 --	to get one.
   2084 -- @param length Length of the text range to get.
   2085 -- @function text_range
   2086 
   2087 return M