Regex Pattern Reference
A comprehensive library of battle-tested Regular Expressions for validation, data extraction, and text processing. Each pattern is copy-and-paste ready with detailed explanations.
Email Address
^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$Phone Number (US)
^\(?([0-9]{3})\)?[-.●]?([0-9]{3})[-.●]?([0-9]{4})$Strong Password
^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]{8,}$URL / Website
https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)IPv4 Address
^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}$Date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$Hex Color
^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$Slug (URL Friendly)
^[a-z0-9]+(?:-[a-z0-9]+)*$Username
^[a-z0-9_-]{3,16}$HTML Tag
<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)Credit Card (Visa)
^4[0-9]{12}(?:[0-9]{3})?$MasterCard
^(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}$Social Security Number
^(?!000|666|9\d{2})\d{3}-(?!00)\d{2}-(?!0000)\d{4}$ZIP Code (US)
^\d{5}(-\d{4})?$Time (24-hour)
^([01]?[0-9]|2[0-3]):[0-5][0-9]$Time (12-hour)
^(0?[1-9]|1[0-2]):[0-5][0-9] ?(AM|PM|am|pm)$Understanding Regular Expressions: A Complete Guide
Regular Expressions, commonly known as Regex or RegExp, are sequences of characters that define search patterns. They originated in theoretical computer science and formal language theory, but today they are an indispensable tool for every developer, data scientist, and system administrator. Whether you are validating user input in a web form, searching through log files for specific error patterns, or performing complex find-and-replace operations in your code editor, regex provides the precision and flexibility you need.
The patterns in this reference have been collected and refined over years of real-world usage. Each one has been tested against edge cases and common variations to ensure reliability. However, it is important to remember that regex for validation should often be paired with additional server-side validation, especially for critical data like emails where the only true validation is sending a confirmation message.
The Building Blocks of Regex
Every regular expression is constructed from a combination of literal characters and metacharacters. Literal characters match themselves exactly, so the pattern cat will match the string "cat" in "concatenate". Metacharacters, on the other hand, have special meanings that allow you to define complex patterns. Understanding these building blocks is the first step to regex mastery.
The most fundamental metacharacters are the anchors. The caret ^ matches the start of a string, while the dollar sign $ matches the end. When you write ^hello$, you are saying "the entire string must be exactly 'hello'". Without these anchors, the pattern would match "hello" anywhere within a larger string like "say hello world".
Character Classes and Quantifiers
Character classes allow you to match any one character from a defined set. Square brackets [abc] create a character class that matches either 'a', 'b', or 'c'. You can use ranges like [a-z] for all lowercase letters, [0-9] for all digits, or combine them like [a-zA-Z0-9] for all alphanumeric characters. The shorthand \d is equivalent to [0-9], \w matches word characters (letters, digits, underscore), and \s matches whitespace.
Quantifiers specify how many times a pattern should repeat. The asterisk * means "zero or more times", the plus + means "one or more times", and the question mark ? means "zero or one time" (making the preceding element optional). Curly braces provide precise control: {3} means exactly three times, {2,5} means between two and five times, and {3,} means three or more times with no upper limit.
Grouping and Capturing
Parentheses serve two purposes in regex: grouping and capturing. When you write (abc)+, the parentheses group the three characters together so the plus applies to the entire sequence, matching "abc", "abcabc", "abcabcabc", etc. At the same time, the matched content is "captured" and can be referenced later. In most regex engines, captured groups are numbered starting from 1, and you can reference them using \1, \2, etc.
If you need grouping without capturing (for performance or clarity), use the non-capturing group syntax (?:abc). This is particularly useful in complex patterns where you do not need to extract every grouped element.
Lookahead and Lookbehind Assertions
Lookaround assertions are zero-width assertions that match a position without consuming characters. A positive lookahead (?=...) asserts that what follows the current position matches the pattern. For example, foo(?=bar) matches "foo" only if it is followed by "bar", but "bar" is not included in the match. Negative lookahead (?!...) does the opposite, matching only when the pattern does not follow.
Similarly, lookbehind assertions (?<=...) and (?<!...) check what precedes the current position. Note that lookbehind support varies between regex engines; JavaScript only added support in ES2018, and some patterns have restrictions on what can appear inside lookbehinds.
Common Pitfalls and Best Practices
One of the most common mistakes is forgetting that regex is greedy by default. The pattern .* will match as much text as possible, which can lead to unexpected results. Adding a question mark after a quantifier makes it lazy: .*? matches as little as possible. Understanding the difference between greedy and lazy matching is crucial for writing efficient patterns.
Another pitfall is catastrophic backtracking, where poorly written patterns can cause the regex engine to take exponentially long to evaluate certain inputs. This is especially dangerous in applications that accept user input, as it can lead to denial-of-service vulnerabilities. Always test your patterns with adversarial inputs and consider using timeouts in production code.
Finally, remember that regex should not be used for everything. Parsing complex nested structures like HTML or JSON is better done with dedicated parsers. The famous saying "now you have two problems" reminds us that reaching for regex when a simpler solution exists can add unnecessary complexity to your code.
Language-Specific Considerations
While the core regex syntax is fairly consistent across programming languages, there are important differences to be aware of. JavaScript regex uses forward slashes as delimiters (/pattern/flags), while Python uses raw strings (r"pattern"). Flag characters also vary: JavaScript uses 'g' for global matching, while Python uses re.FINDALL. Some features like atomic groups and possessive quantifiers are available in some engines but not others.
When working in multiple languages, it is helpful to test your patterns in a language-specific regex tester. Online tools like regex101.com allow you to select your target engine and see exactly how your pattern will behave.
Performance Optimization
For applications that need to match patterns thousands or millions of times, regex performance becomes critical. Compiled patterns are faster than interpreted ones in languages that support compilation. Anchoring your patterns with ^ and $ when applicable allows the engine to fail fast. Using specific character classes like [a-z] instead of the more general . can also improve performance.
If you find yourself needing to match one of many possible strings, consider whether a simple substring search or hash table lookup might be more efficient than a regex alternation like (apple|banana|cherry|...).
Real-World Applications
Beyond validation, regex is invaluable for data extraction and transformation. Web scrapers use regex to pull specific pieces of information from HTML. Log analysis tools like grep and awk rely heavily on regex to filter and parse log entries. Text editors and IDEs use regex for powerful find-and-replace operations that would be impossible with simple text matching.
In data science, regex is used to clean and normalize text data before analysis. Extracting dates from various formats, standardizing phone numbers, removing unwanted characters—all of these tasks become manageable with the right regex patterns.