Phone Number Transformers
Standardize, validate, and extract components from phone numbers.
Usage
| phone | standardized | is_valid | area_code | local_number | has_ext | extension | is_toll_free |
| (555) 123-4567 | (555) 123-4567 | true | 555 | 1234567 | false | null | false |
| +1-800-555-1234 | +1 800-555-1234 | true | 800 | 5551234 | false | null | true |
| 555.123.4567 ext 890 | 555.123.4567 | true | 555 | 1234567 | true | 890 | false |
| 123-45-67 | null | false | null | null | false | null | false |
| 1-800-FLOWERS | 1-800-356-9377 | true | 800 | 3569377 | false | null | true |
| 415 555 0123 | 415-555-0123 | true | 415 | 5550123 | false | null | false |
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from transformers.pyspark.phone_numbers import phone_numbers
# Initialize Spark
spark = SparkSession.builder.appName("PhoneCleaning").getOrCreate()
# Create sample data
data = [
("(555) 123-4567",),
("+1-800-555-1234",),
("555.123.4567 ext 890",),
("123-45-67",),
("1-800-FLOWERS",),
]
df = spark.createDataFrame(data, ["phone"])
# Apply transformations
result_df = df.select(
F.col("phone"),
phone_numbers.standardize_phone(F.col("phone")).alias("standardized"),
phone_numbers.is_valid_phone(F.col("phone")).alias("is_valid"),
phone_numbers.extract_area_code(
phone_numbers.standardize_phone(F.col("phone"))
).alias("area_code"),
phone_numbers.extract_local_number(
phone_numbers.standardize_phone(F.col("phone"))
).alias("local_number"),
phone_numbers.has_extension(F.col("phone")).alias("has_ext"),
phone_numbers.extract_extension(F.col("phone")).alias("extension"),
phone_numbers.is_toll_free(
phone_numbers.standardize_phone(F.col("phone"))
).alias("is_toll_free")
)
# Show results
result_df.show(truncate=False)
Installation
datacompose add phone_numbers
API Reference
Extract Functions
phone_numbers.extract_phone_from_text
Extract first phone number from text using regex patterns.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing text with potential phone numbers |
phone_numbers.extract_all_phones_from_text
Extract all phone numbers from text as an array.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing text with potential phone numbers |
phone_numbers.extract_digits
Extract only digits from phone number string.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.extract_extension
Extract extension from phone number if present.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.extract_country_code
Extract country code from phone number.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.extract_area_code
Extract area code from NANP phone number.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.extract_exchange
Extract exchange (first 3 digits of local number) from NANP phone number.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.extract_subscriber
Extract subscriber number (last 4 digits) from NANP phone number.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.extract_local_number
Extract local number (exchange + subscriber) from NANP phone number.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
Transform Functions
phone_numbers.standardize_phone
Standardize phone number with cleaning and NANP formatting.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.standardize_phone_e164
Standardize phone number with cleaning and E.164 formatting.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.standardize_phone_digits
Standardize phone number and return digits only.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
Validation Functions
phone_numbers.is_valid_nanp
Check if phone number is valid NANP format (North American Numbering Plan).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.is_valid_international
Check if phone number could be valid international format.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
min_length required | Column | Minimum digits for international number |
max_length required | Column | Maximum digits for international number |
phone_numbers.is_valid_phone
Check if phone number is valid (NANP or international).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.is_toll_free
Check if phone number is toll-free (800, 888, 877, 866, 855, 844, 833).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.is_premium_rate
Check if phone number is premium rate (900).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.has_extension
Check if phone number has an extension.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
Utility Functions
phone_numbers.remove_non_digits
Remove all non-digit characters from phone number.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.remove_extension
Remove extension from phone number.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.convert_letters_to_numbers
Convert phone letters to numbers (e.g., 1-800-FLOWERS to 1-800-3569377).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number with letters |
phone_numbers.normalize_separators
Normalize various separator styles to hyphens. Removes parentheses and replaces dots, spaces with hyphens.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.add_country_code
Add country code "1" if not present (for NANP numbers).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.format_nanp
Format NANP phone number in standard hyphen format (XXX-XXX-XXXX).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.format_nanp_paren
Format NANP phone number with parentheses ((XXX) XXX-XXXX).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.format_nanp_dot
Format NANP phone number with dots (XXX.XXX.XXXX).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.format_nanp_space
Format NANP phone number with spaces (XXX XXX XXXX).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.format_international
Format international phone number with country code.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.format_e164
Format phone number in E.164 format (+CCAAANNNNNNN) with default country code 1.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.clean_phone
Clean and validate phone number, returning null for invalid numbers.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.get_phone_type
Get phone number type (toll-free, premium, standard, international).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.get_region_from_area_code
Get geographic region from area code (simplified - would need lookup table).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.mask_phone
Mask phone number for privacy keeping last 4 digits (e.g., ***-***-1234).
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.filter_valid_phones
Return phone number only if valid, otherwise return null.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.filter_nanp_phones
Return phone number only if valid NANP, otherwise return null.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |
phone_numbers.filter_toll_free_phones
Return phone number only if toll-free, otherwise return null.
Parameters
Property | Type | Description |
---|---|---|
col required | Column | Column containing phone number |