Approved Harnessing the Power of Regular Expressions in Highlights

Morfent · Feb 25, 2015

SolarisFox made a pretty good thread on this here, but it's a bit lacking in information on some of the other features that can be used, like lazy repetition, lookahead and lookaround, as well as how to optimize regex patterns and how the client handles highlights. If he wants to, he can help cowrite the article with me. I also think that making the guide an article in The Player over a post on the PS forum would help bring a lot more exposure and help more people learn how to make more useful highlights.

What is highlight?
- Highlights are a list of phrases that the PS client searches chat messages with. If a message contains any of the phrases in it, the message's background changes orange. By default, your username is highlighted.
What are regular expressions, and how would they help?
- Regular expressions, also known as regexes, are a sequence of characters that form a method of searching for patterns in text when interpreted by the regex engine. Highlights work by creating one regex pattern from the highlight list, and testing each chat message for a match. Because of this, regex patterns can be added to the highlight list. Why would this matter? If a highlight list contains several entries that are similar, they most likely can be merged into one regex pattern, making the list much easier to manage. If a highlight for phrases beginning or ending with a pattern are needed, regex is absolutely necessary.
How do I write regular expressions?
1. Characters
  - All of the following are reserved, since they each perform a specific function, and need a backslash inserted before them if the regex should match the literal character: \, ^, $, ., |, ?, *, +, [, (, and )
  - For example, to match what's 9+10?, the highlight would need to be what's 9\+10\?. Without the backslashes, it would not match what it is intended to.
2. Character classes
  - Character classes are sets of characters that the pattern can or cannot match. These are designated by enclosing the characters the pattern should match in square brackets. To negate a character class to make the pattern match characters not within the set, place a caret after the opening square bracket. Ranges of letters, numbers, and unicode characters can be declared using a hyphen. For example, [^a-z] matches everything but lowercase letters.
  - There is shorthand for common character classes:
    - \w: letters, numbers, and underscores ([A-Za-z0-9_])
    - \d: numbers ([0-9])
    - \s: whitespace. Spaces are the only ones that should appear in chat messages
    - Matching the opposite of any of these only requires changing the lowercase letter to an uppercase one (e.g. \D matches everything but numbers)
  - The dot (.) matches all characters that appear in chat messages. Do not use this if the regex could be more specific with what characters to match with a character class, which is nearly all the time with highlights.
3. Anchors and Boundaries
  - There are two anchors: the caret (^) and the dollar sign ($). A caret asserts the position of the regex as being at the beginning of the chat message, while a dollar sign asserts that it's at the end of it, e.g. ^\/me\s highlights any message using /me.
  - Word boundaries (\b) assert that the current position is between a word and a non-word character, in either order. This is only useful if needed in the middle of a regex, since PS already adds these to the beginning and end of the regex for the highlight list.
4. Groups
  - Groups group characters together. This will be important with the operators below, since they would otherwise only target one character. Groups can be nested if needed.
  - Non-capturing groups ((?:)) should be used in the majority of cases where groups are needed.
5. Quantifiers
  - Question marks (?) mark a character or group as being optional, e.g. "sol", "solaris", and "solarisfox" can all be matched by sol(aris(?:fox)?)?.
  - Asterisks (*) match a character or group as many times as possible, regardless of whether it appears or not.
  - Plus signs (+) match one or more of a character or group as many times as possible. If it does not appear, the regex fails.
6. Anchors and Boundaries
  - The caret (^) asserts the position of what should be matched to the beginning of the chat message, e.g. ^\/me\s only matches /me messages.
  - The dollar sign ($) asserts the position of what should be matched to the end of the chat message.
7. Alternation
  - Explain how to use alternation
8. Lookahead and Lookaround
  - Explain positive and negative lookahead
  - Briefly explain lookaround

Spy · Feb 25, 2015

This looks great, I'm sure not many people know about this and it would be awesome to get some publicity for it, I'm cool with it.

Morfent · Feb 25, 2015

SolarisFox has agreed on cowriting this with me.

antemortem · Mar 2, 2015

so many headers and symbols

this is really informative and I appreciate that, but it'll be a hit or miss as far as being a useful article. It might be more useful as a stickied resource somewhere...

fx · Mar 9, 2015

Spy · Mar 9, 2015

Actually f(x), I'm interested if Morfent has any reasoning as to why he wanted a Player article over a stickied resource thread (hence why we asked). The feature would get a ton more publicity this way since no one knows about it, and it's entertaining to read about, kinda like how Darnell made an article on auto join and backgrounds.

Spy · Mar 9, 2015

Oops forgot to tag Morfent

Quarkz · Mar 9, 2015

I guess it could also be possible to just use the forum pot as a completed WIP and treat it as a normal article from there. Most of the work is already done and we can probably get this out relatively quickly.

Morfent · Mar 12, 2015

I'd be fine with the forum post as an incomplete WIP because it's missing some information on a couple things, but Sol and I shouldn't have any trouble writing thw rest.

Quarkz · Mar 13, 2015

Sounds good

fx · Mar 13, 2015

coolio then, ill be approving this then since it looks like this is set in stone
that being said solarisfox needs kiln access

Approved Harnessing the Power of Regular Expressions in Highlights

Morfent

formerly known as clifford the big red pawg

Spy

Morfent

formerly known as clifford the big red pawg

antemortem

fx

moon tourism

Spy

Spy

Quarkz

dGltZSBmb3IgbWUgdG8gbW92ZSBvbg==

Morfent

formerly known as clifford the big red pawg

Quarkz

dGltZSBmb3IgbWUgdG8gbW92ZSBvbg==

fx

moon tourism

Users Who Are Viewing This Thread (Users: 1, Guests: 0)