Regex Pattern Reference

A comprehensive library of battle-tested Regular Expressions for validation, data extraction, and text processing. Each pattern is copy-and-paste ready with detailed explanations.

Email Address

^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$
Validates standard email formats like user@domain.com. Works for most common email providers.

Phone Number (US)

^\(?([0-9]{3})\)?[-.●]?([0-9]{3})[-.●]?([0-9]{4})$
Matches US phone formats including (555) 555-5555, 555-555-5555, and 5555555555.

Strong Password

^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]{8,}$
Requires minimum 8 characters with at least one letter and one number for basic security.

URL / Website

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)
Matches both http and https URLs with optional www prefix and query parameters.

IPv4 Address

^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}$
Validates IP addresses from 0.0.0.0 to 255.255.255.255 with proper octet boundaries.

Date (YYYY-MM-DD)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Matches ISO 8601 date format. Note: Does not validate actual calendar dates like Feb 30.

Hex Color

^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$
Matches CSS hex colors in both short (#FFF) and long (#FFFFFF) formats.

Slug (URL Friendly)

^[a-z0-9]+(?:-[a-z0-9]+)*$
Validates URL slugs like "my-blog-post-title" for SEO-friendly URLs.

Username

^[a-z0-9_-]{3,16}$
Allows lowercase letters, numbers, underscores, and hyphens between 3-16 characters.

HTML Tag

<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)
Matches opening and closing HTML tags. Warning: Regex is not ideal for parsing HTML.

Credit Card (Visa)

^4[0-9]{12}(?:[0-9]{3})?$
Validates Visa card numbers starting with 4, either 13 or 16 digits long.

MasterCard

^(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}$
Matches MasterCard numbers with current BIN ranges including 2-series cards.

Social Security Number

^(?!000|666|9\d{2})\d{3}-(?!00)\d{2}-(?!0000)\d{4}$
Validates SSN format while excluding invalid ranges like 000 and 666.

ZIP Code (US)

^\d{5}(-\d{4})?$
Matches US ZIP codes in both 5-digit (12345) and ZIP+4 (12345-6789) formats.

Time (24-hour)

^([01]?[0-9]|2[0-3]):[0-5][0-9]$
Validates 24-hour time format from 00:00 to 23:59.

Time (12-hour)

^(0?[1-9]|1[0-2]):[0-5][0-9] ?(AM|PM|am|pm)$
Matches 12-hour time with AM/PM indicator like 12:30 PM or 9:45am.

Understanding Regular Expressions: A Complete Guide

Regular Expressions, commonly known as Regex or RegExp, are sequences of characters that define search patterns. They originated in theoretical computer science and formal language theory, but today they are an indispensable tool for every developer, data scientist, and system administrator. Whether you are validating user input in a web form, searching through log files for specific error patterns, or performing complex find-and-replace operations in your code editor, regex provides the precision and flexibility you need.

The patterns in this reference have been collected and refined over years of real-world usage. Each one has been tested against edge cases and common variations to ensure reliability. However, it is important to remember that regex for validation should often be paired with additional server-side validation, especially for critical data like emails where the only true validation is sending a confirmation message.

The Building Blocks of Regex

Every regular expression is constructed from a combination of literal characters and metacharacters. Literal characters match themselves exactly, so the pattern cat will match the string "cat" in "concatenate". Metacharacters, on the other hand, have special meanings that allow you to define complex patterns. Understanding these building blocks is the first step to regex mastery.

The most fundamental metacharacters are the anchors. The caret ^ matches the start of a string, while the dollar sign $ matches the end. When you write ^hello$, you are saying "the entire string must be exactly 'hello'". Without these anchors, the pattern would match "hello" anywhere within a larger string like "say hello world".

Character Classes and Quantifiers

Character classes allow you to match any one character from a defined set. Square brackets [abc] create a character class that matches either 'a', 'b', or 'c'. You can use ranges like [a-z] for all lowercase letters, [0-9] for all digits, or combine them like [a-zA-Z0-9] for all alphanumeric characters. The shorthand \d is equivalent to [0-9], \w matches word characters (letters, digits, underscore), and \s matches whitespace.

Quantifiers specify how many times a pattern should repeat. The asterisk * means "zero or more times", the plus + means "one or more times", and the question mark ? means "zero or one time" (making the preceding element optional). Curly braces provide precise control: {3} means exactly three times, {2,5} means between two and five times, and {3,} means three or more times with no upper limit.

Grouping and Capturing

Parentheses serve two purposes in regex: grouping and capturing. When you write (abc)+, the parentheses group the three characters together so the plus applies to the entire sequence, matching "abc", "abcabc", "abcabcabc", etc. At the same time, the matched content is "captured" and can be referenced later. In most regex engines, captured groups are numbered starting from 1, and you can reference them using \1, \2, etc.

If you need grouping without capturing (for performance or clarity), use the non-capturing group syntax (?:abc). This is particularly useful in complex patterns where you do not need to extract every grouped element.

Lookahead and Lookbehind Assertions

Lookaround assertions are zero-width assertions that match a position without consuming characters. A positive lookahead (?=...) asserts that what follows the current position matches the pattern. For example, foo(?=bar) matches "foo" only if it is followed by "bar", but "bar" is not included in the match. Negative lookahead (?!...) does the opposite, matching only when the pattern does not follow.

Similarly, lookbehind assertions (?<=...) and (?<!...) check what precedes the current position. Note that lookbehind support varies between regex engines; JavaScript only added support in ES2018, and some patterns have restrictions on what can appear inside lookbehinds.

Common Pitfalls and Best Practices

One of the most common mistakes is forgetting that regex is greedy by default. The pattern .* will match as much text as possible, which can lead to unexpected results. Adding a question mark after a quantifier makes it lazy: .*? matches as little as possible. Understanding the difference between greedy and lazy matching is crucial for writing efficient patterns.

Another pitfall is catastrophic backtracking, where poorly written patterns can cause the regex engine to take exponentially long to evaluate certain inputs. This is especially dangerous in applications that accept user input, as it can lead to denial-of-service vulnerabilities. Always test your patterns with adversarial inputs and consider using timeouts in production code.

Finally, remember that regex should not be used for everything. Parsing complex nested structures like HTML or JSON is better done with dedicated parsers. The famous saying "now you have two problems" reminds us that reaching for regex when a simpler solution exists can add unnecessary complexity to your code.

Language-Specific Considerations

While the core regex syntax is fairly consistent across programming languages, there are important differences to be aware of. JavaScript regex uses forward slashes as delimiters (/pattern/flags), while Python uses raw strings (r"pattern"). Flag characters also vary: JavaScript uses 'g' for global matching, while Python uses re.FINDALL. Some features like atomic groups and possessive quantifiers are available in some engines but not others.

When working in multiple languages, it is helpful to test your patterns in a language-specific regex tester. Online tools like regex101.com allow you to select your target engine and see exactly how your pattern will behave.

Performance Optimization

For applications that need to match patterns thousands or millions of times, regex performance becomes critical. Compiled patterns are faster than interpreted ones in languages that support compilation. Anchoring your patterns with ^ and $ when applicable allows the engine to fail fast. Using specific character classes like [a-z] instead of the more general . can also improve performance.

If you find yourself needing to match one of many possible strings, consider whether a simple substring search or hash table lookup might be more efficient than a regex alternation like (apple|banana|cherry|...).

Real-World Applications

Beyond validation, regex is invaluable for data extraction and transformation. Web scrapers use regex to pull specific pieces of information from HTML. Log analysis tools like grep and awk rely heavily on regex to filter and parse log entries. Text editors and IDEs use regex for powerful find-and-replace operations that would be impossible with simple text matching.

In data science, regex is used to clean and normalize text data before analysis. Extracting dates from various formats, standardizing phone numbers, removing unwanted characters—all of these tasks become manageable with the right regex patterns.