From Issue #158
June 2007
The internet Engineering Job Power (IETF) document, RFC 3696,
Microsoft Office 2010 Professional Plus, “Application
Techniques for Checking and Transformation of Names” by John
Klensin,
presents several valid e-mail addresses which can be rejected by several PHP
validation routines. The addresses:
Abc\@def@example.com,
customer/department=shipping@example.com and
!def!xyz%abc@example.com
are all legitimate. One of the far more well-liked normal expressions found inside the
literature rejects all of them:
"^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)
↪*(\.[a-z]2,
Office 2007 Key,3)$"
This typical expression allows only the underscore (_) and hyphen
(-) characters, numbers and lowercase alphabetic characters. Even
assuming a preprocessing step that converts uppercase alphabetic
characters to lowercase,
Microsoft Office Enterprise 2007, the expression rejects addresses with
valid characters, such as the slash (/), equal sign (=), exclamation
point (,
Microsoft Office 2007 Professional!) and percent (%). The expression also requires that the
highest-level domain component has only two or three characters, thus
rejecting legitimate domains, such as .museum.
Another favorite regular expression solution is the following:
"^[a-zA-Z0-9_.-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$"
This standard expression rejects all the legitimate examples inside the preceding paragraph.
It does have the grace to allow uppercase alphabetic characters, and
it doesn't make the error of assuming a high-level domain name has only
two or three characters. It allows invalid domain names, such as
illustration..com.
Listing 1 shows an example from PHP Dev Shed (www.devshed.com/c/a/PHP/Email-Address-Verification-with-PHP/2).
The code contains (at least) three errors. First, it fails to recognize
many legitimate e-mail address characters, such as percent (%). Second,
Microsoft Office 2007 Enterprise, it
splits the e-mail address into user name and domain parts at the at sign
(@). E-mail addresses that contain a quoted at sign, such as
Abc\@def@example.com will break this code. Third, it fails to check
for host address DNS records. Hosts with a type A DNS entry will accept
e-mail and may not necessarily publish a type MX entry. I'm not
picking on the author at PHP Dev Shed. A lot more than 100 reviewers gave
this a four-out-of-five-star rating.