SolarisFox made a pretty good thread on this here, but it's a bit lacking in information on some of the other features that can be used, like lazy repetition, lookahead and lookaround, as well as how to optimize regex patterns and how the client handles highlights. If he wants to, he can help cowrite the article with me. I also think that making the guide an article in The Player over a post on the PS forum would help bring a lot more exposure and help more people learn how to make more useful highlights.
- What is highlight?
- Highlights are a list of phrases that the PS client searches chat messages with. If a message contains any of the phrases in it, the message's background changes orange. By default, your username is highlighted.
- What are regular expressions, and how would they help?
- Regular expressions, also known as regexes, are a sequence of characters that form a method of searching for patterns in text when interpreted by the regex engine. Highlights work by creating one regex pattern from the highlight list, and testing each chat message for a match. Because of this, regex patterns can be added to the highlight list. Why would this matter? If a highlight list contains several entries that are similar, they most likely can be merged into one regex pattern, making the list much easier to manage. If a highlight for phrases beginning or ending with a pattern are needed, regex is absolutely necessary.
- How do I write regular expressions?
- Characters
- All of the following are reserved, since they each perform a specific function, and need a backslash inserted before them if the regex should match the literal character: \, ^, $, ., |, ?, *, +, [, (, and )
- For example, to match what's 9+10?, the highlight would need to be what's 9\+10\?. Without the backslashes, it would not match what it is intended to.
- Character classes
- Character classes are sets of characters that the pattern can or cannot match. These are designated by enclosing the characters the pattern should match in square brackets. To negate a character class to make the pattern match characters not within the set, place a caret after the opening square bracket. Ranges of letters, numbers, and unicode characters can be declared using a hyphen. For example, [^a-z] matches everything but lowercase letters.
- There is shorthand for common character classes:
- \w: letters, numbers, and underscores ([A-Za-z0-9_])
- \d: numbers ([0-9])
- \s: whitespace. Spaces are the only ones that should appear in chat messages
- Matching the opposite of any of these only requires changing the lowercase letter to an uppercase one (e.g. \D matches everything but numbers)
- The dot (.) matches all characters that appear in chat messages. Do not use this if the regex could be more specific with what characters to match with a character class, which is nearly all the time with highlights.
- Anchors and Boundaries
- There are two anchors: the caret (^) and the dollar sign ($). A caret asserts the position of the regex as being at the beginning of the chat message, while a dollar sign asserts that it's at the end of it, e.g. ^\/me\s highlights any message using /me.
- Word boundaries (\b) assert that the current position is between a word and a non-word character, in either order. This is only useful if needed in the middle of a regex, since PS already adds these to the beginning and end of the regex for the highlight list.
- There are two anchors: the caret (^) and the dollar sign ($). A caret asserts the position of the regex as being at the beginning of the chat message, while a dollar sign asserts that it's at the end of it, e.g. ^\/me\s highlights any message using /me.
- Groups
- Groups group characters together. This will be important with the operators below, since they would otherwise only target one character. Groups can be nested if needed.
- Non-capturing groups ((?:)) should be used in the majority of cases where groups are needed.
- Groups group characters together. This will be important with the operators below, since they would otherwise only target one character. Groups can be nested if needed.
- Quantifiers
- Question marks (?) mark a character or group as being optional, e.g. "sol", "solaris", and "solarisfox" can all be matched by sol(aris(?:fox)?)?.
- Asterisks (*) match a character or group as many times as possible, regardless of whether it appears or not.
- Plus signs (+) match one or more of a character or group as many times as possible. If it does not appear, the regex fails.
- Question marks (?) mark a character or group as being optional, e.g. "sol", "solaris", and "solarisfox" can all be matched by sol(aris(?:fox)?)?.
- Anchors and Boundaries
- The caret (^) asserts the position of what should be matched to the beginning of the chat message, e.g. ^\/me\s only matches /me messages.
- The dollar sign ($) asserts the position of what should be matched to the end of the chat message.
- Alternation
- Explain how to use alternation
- Lookahead and Lookaround
- Explain positive and negative lookahead
- Briefly explain lookaround
- Characters
Last edited: