Email Transformers

Clean, validate, and extract information from email addresses.

Usage


| email                  | standardized           | username   | domain           | is_valid   |
| John.Doe@Gmail.COM     | john.doe@gmail.com     | john.doe   | gmail.com        | true       |
| JANE.SMITH@OUTLOOK.COM | jane.smith@outlook.com | jane.smith | outlook.com      | true       |
| info@company-name.org  | info@company-name.org  | info       | company-name.org | true       |
| invalid.email@         | null                   | null       | null             | false      |
| user+tag@domain.co.uk  | user+tag@domain.co.uk  | user+tag   | domain.co.uk     | true       |
| bad email@test.com     | null                   | null       | null             | false      |

Installation

datacompose add emails

API Reference

Extract Functions

emails.extract_email

Extract first valid email address from text.

Parameters

Property Type Description
col required
Column
Column containing text with potential email addresses

emails.extract_all_emails

Extract all email addresses from text as an array.

Parameters

Property Type Description
col required
Column
Column containing text with potential email addresses

emails.extract_username

Extract username (local part) from email address.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.extract_domain

Extract domain from email address.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.extract_domain_name

Extract domain name without TLD from email address.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.extract_tld

Extract top-level domain from email address.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.extract_name_from_email

Attempt to extract person's name from email username. E.g., john.smith@example.com -> "John Smith"

Parameters

Property Type Description
col required
Column
Column containing email address

Transform Functions

emails.standardize_email

Apply standard email cleaning and normalization.

Parameters

Property Type Description
col required
Column
Column containing email address
lowercase required
Column
Convert to lowercase
remove_dots_gmail required
Column
Remove dots from Gmail addresses
remove_plus required
Column
Remove plus addressing
fix_typos required
Column
Fix common domain typos

Validation Functions

emails.is_valid_email

Check if email address has valid format.

Parameters

Property Type Description
col required
Column
Column containing email address
min_length required
Column
Minimum length for valid email
max_length required
Column
Maximum length for valid email

emails.is_valid_username

Check if email username part is valid.

Parameters

Property Type Description
col required
Column
Column containing email address
min_length required
Column
Minimum length for valid username
max_length required
Column
Maximum length for valid username

emails.is_valid_domain

Check if email domain part is valid.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.has_plus_addressing

Check if email uses plus addressing (e.g., user+tag@gmail.com).

Parameters

Property Type Description
col required
Column
Column containing email address

emails.is_disposable_email

Check if email is from a disposable email service.

Parameters

Property Type Description
col required
Column
Column containing email address
disposable_domains required
Column
List of disposable domains to check against

emails.is_corporate_email

Check if email appears to be from a corporate domain (not free email provider).

Parameters

Property Type Description
col required
Column
Column containing email address
free_providers required
Column
List of free email provider domains to check against

Utility Functions

emails.remove_whitespace

Remove all whitespace from email address.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.lowercase_email

Convert entire email address to lowercase.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.lowercase_domain

Convert only domain part to lowercase, preserve username case.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.remove_plus_addressing

Remove plus addressing from email (e.g., user+tag@gmail.com -> user@gmail.com).

Parameters

Property Type Description
col required
Column
Column containing email address

emails.remove_dots_from_gmail

Remove dots from Gmail addresses (Gmail ignores dots in usernames).

Parameters

Property Type Description
col required
Column
Column containing email address

emails.fix_common_typos

Fix common domain typos in email addresses.

Parameters

Property Type Description
col required
Column
Column containing email address
custom_mappings required
Column
Additional domain mappings to apply (extends DOMAIN_TYPO_MAPPINGS)
custom_tld_mappings required
Column
Additional TLD mappings to apply (extends TLD_TYPO_MAPPINGS)

emails.normalize_gmail

Normalize Gmail addresses (remove dots, plus addressing, lowercase).

Parameters

Property Type Description
col required
Column
Column containing email address

emails.get_canonical_email

Get canonical form of email address for deduplication. Applies maximum normalization.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.get_email_provider

Get email provider name from domain.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.mask_email

Mask email address for privacy (e.g., joh***@gm***.com).

Parameters

Property Type Description
col required
Column
Column containing email address
mask_char required
Column
Character to use for masking
keep_chars required
Column
Number of characters to keep at start

emails.filter_valid_emails

Return email only if valid, otherwise return null.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.filter_corporate_emails

Return email only if corporate, otherwise return null.

Parameters

Property Type Description
col required
Column
Column containing email address

emails.filter_non_disposable_emails

Return email only if not disposable, otherwise return null.

Parameters

Property Type Description
col required
Column
Column containing email address