Emoji to Zero-Day: Latin Homoglyphs in Domains and Subdomains

Posted by Matt Hamilton on March 4, 2020
Matt Hamilton
Find me on:


The Vulnerability

Prior to this advisory, it was possible to register homograph domain names on gTLDs (.com, .net, etc.) as well as subdomains within some SaaS companies using homoglyph characters. This vulnerability is similar to an IDN Homograph attack and presents all the same risks. An attacker could register a domain or subdomain which appears visually identical to its legitimate counterpart and perform social-engineering or insider attacks against an organization.

Between 2017 and today, more than a dozen homograph domains have had active HTTPS certificates. This included prominent financial, internet shopping, technology, and other Fortune 100 sites. There is no legitimate or non-fraudulent justification for this activity (excluding the research I conducted for this responsible disclosure 😊). While it is unlikely that you, the reader, were attacked with this technique, it is likely that this technique was used in highly targeted social-engineering campaigns.

Discovery of this resulted in a zero-day re-classification for this vulnerability, which occurred on February 13, 2020. The deadline of February 20th, 2020 was later extended, allowing Verisign to apply mitigations for gTLDs.

The following table shows “lookalike” Unicode Latin IPA Extension homoglyph characters and their Latin counterparts:

a g l
ɑ ɡ ɩ

*Note that these characters may appear differently on different machines, as fonts vary.

The “ɡ” (Voiced Velar Stop) is the most convincing character—often near indistinguishable from its Latin counterpart.

The “ɑ” (Latin Alpha) is also very convincing, particularly when not adjacent to a Latin “a”.

The “ɩ” (Latin Iota) is the least convincing of the group. On some systems and fonts this character appears very similar to a lowercase “L”, but it’s more often the case that this character can be discerned from its Latin counterpart.

 

homoglyph4

Preamble

In late November 2019, I attempted to register a bucket (and therefor subdomain) with a Unicode Emoji character. I then became curious if it was possible to register Unicode homoglyph characters in bucket names. Sure enough, it was possible to register Latin homoglyph characters, specifically Unicode Latin IPA Extension homoglyphs. I then checked if it was possible to register domains with these homoglyph characters. Ruh-roh, it was.


Background

This disclosure demonstrates the use of Unicode Latin IPA homoglyph characters to generate domain and subdomain homographs.

Generally speaking, homograph attacks are not novel. This type of attack has been known for many years and domain providers have put mitigations in place. This includes restricting the use of some characters and preventing the use of mixed-scripts, such as Latin and Cyrillic.

It appears that Verisign and other providers have been unaware of the homoglyphs within the Unicode Latin IPA Extension character set. This disclosure aims to spread awareness of this issue and allow vendors to establish mitigations for this risk.


Responsible Disclosure

In a partnership between Soluble and Bishop Fox, Verisign and IaaS services (Google, Amazon, Wasabi, DigitalOcean) were notified of the vulnerability and have received continuous updates on the ongoing research. Some of these vendors were responsive and engaged in productive dialog, though others have not responded or did not want to fix the issue.

In the weeks that followed, a tool was developed to facilitate generating domain permutations for these homoglyph characters and checking Certificate Transparency logs.

It was discovered that between 2017 and the present, third-parties had registered and generated HTTPS certificates for 15 of the 300 tested domains using this homoglyph technique. Additionally, one instance of a homoglyph domain hosting an unofficial and presumed malicious jQuery library was found.

Upon identifying current and historical abuse of these homoglyphs, the issue was reclassified as a zero-day. Subsequently, the aforementioned vendors were notified and the disclosure timeline was reduced to seven days per Soluble’s Vulnerability Disclosure Policy and Bishop Fox’s Vulnerability Disclosure Policy. After vendors were notified, US CERT was contacted to assist in remediation.

Due to the industry-wide implications of releasing this advisory, it was deemed an exceptional circumstance per our disclosure policy. Additional time was allotted to allow Verisign to implement vulnerability mitigations for gTLDs.

The disclosure timeline is shown below:

11/22/2019 - Vulnerability identified.
11/23/2019 - Amazon responsible disclosure report submitted, receipt confirmed.
11/26/2019 - Google responsible disclosure report submitted, receipt confirmed. - last contact from Google
12/10/2019 - Wasabi responsible disclosure report submitted, receipt confirmed. - last contact from Wasabi
1/27/2020 - Conclusion reached that this vulnerability was severe enough to warrant a report and remediation from Verisign.
1/28/2020 - Verisign responsible disclosure report submitted, receipt confirmed.
1/30/2020 - DigitalOcean responsible disclosure report submitted, receipt confirmed.
2/07/2020 - DigitalOcean confirms intent to investigate mitigations, but states “we view this a very low risk for our users at this time” - Last contact from DigitalOcean
2/10/2020 - Verisign informed of registered homograph domains.
2/13/2020 - Issue reclassified as zero-day, Verisign, Google, Amazon, Wasabi, DigitalOcean notified of zero-day reclassification and placed under seven day embargo
2/14/2020 - US CERT contacted to assist in remediation of vulnerability in contingency for a lack of response from Verisign. - No response from US CERT
2/20/2020 - Feedback provided to ICANN on IDN guideline revision drafts, disclosure deadline extended.
2/24/2020 - Amazon changes S3 bucket name validation policy to prevent registration of bucket names beginning with “xn--”, mitigating these (and all other) Unicode homoglyphs.
3/03/2020 - Verisign implements mitigations for ".com" and ".net", preventing registration of domains containing outlined homoglyphs.
3/04/2020 - Public disclosure


Domain Names on gTLDs

At the time of writing, it was possible to register homographs of prominent domains using the Unicode Latin IPA Extension characters above. This applies to gTLDs run by Verisign (.com, .net, etc.). TLDs maintained by other providers were not tested as a part of this research.

To demonstrate impact for gTLDs and prevent registration by malicious third-parties, I registered the following domains using IPA Extension homoglyph characters:

amɑzon.com**
chɑse.com
sɑlesforce.com
ɡmɑil.com
ɑppɩe.com
ebɑy.com
ɡstatic.com
steɑmpowered.com
theɡuardian.com
theverɡe.com
washinɡtonpost.com
pɑypɑɩ.com
wɑlmɑrt.com
wɑsɑbisys.com
yɑhoo.com
cɩoudfɩare.com
deɩɩ.com
gmɑiɩ.com
gooɡleapis.com
huffinɡtonpost.com
instaɡram.com
microsoftonɩine.com
ɑmɑzonɑws.com**
ɑndroid.com
netfɩix.com
nvidiɑ.com
ɡoogɩe.com

Cost? ~$400. Value? Priceless.

*If your organization owns and wants any of these (non-homograph) domains, please contact me and I will happily transfer them to your organization at no cost. Unfortunately, I simply don’t have the time to reach out to each and every one of these organizations individually. 
** The noted domains have been transferred to the respective organization.

Why was I able to register these domains?

Verisign prevents registering domains which use mixed-scripts. For example, it is not possible to register a domain such as “gооgle.com” using Cyrillic “о”s (though, it seems Google has been grandfathered in and does in fact own this domain). Interestingly, most registrars will allow you to checkout with these mixed-script domains in your cart and some go so far as to bill (and later refund) your credit card.
However, Verisign kicks back the check command from registrars and prevents these registrations from completing.

While it wasn't possible to use mixed-scripts, it was possible to register domains with a mix of Unicode and Latin characters as long as the Unicode characters were themselves Latin. The homoglyphs discussed in this advisory were listed in Verisign’s Latin script allowed character table. This table has been in place since 2014. Verisign has since mitigated this vulnerability by removing those characters, and they will likely be updating this table soon.


Suggested Remediation

Latin TLD registries should prevent the registration of domain names which contain these Unicode Latin IPA Extension homoglyph characters. Verisign has applied fixes to make registering domains with these homoglyph characters for ".com" and ".net" impossible.

It is recommended that organizations review the homograph permutations and the associated Certificate Transparency logs for their domains. I have created a tool to assist with this, available here.


Subdomains

Registrars, like Verisign, explicitly enforce anti-homograph measures (disallowing mixed-scripts) because they don’t want lookalike domains on their gTLDs. Public services that exist on a shared root, such as “s3.amazonaws.com”, “storage.googleapis.com”, or other services which allow users to create arbitrarily-named subdomains, should apply these same restrictions—they are effectively acting as registrars for those roots in the same way Verisign does for “.com”.

Google acknowledges the social engineering risk and specifically prevents creating buckets that contain “google” in the name:

> Bucket names cannot contain "google" or close misspellings, such as "g00gle".

I tried to create a “google” bucket and sure enough, they blocked it. However, Google did allow registering bucket names that use Unicode Latin IPA Extension homoglyph characters.

Additionally, unlike domain names on gTLDs, Google allowed registering bucket names which contain mixed-scripts. To demonstrate this, I successfully registered the “gоogle” bucket, whose first “o” is a Cyrillic character.



Suggested Remediation

Services which allow customers to control resources at a customer-defined arbitrary subdomain should restrict the names of subdomains in the same way that Verisign (and registrars in general) should restrict domain names. Specifically, any subdomains which contain mixed-scripts should be rejected as well as homoglyph characters, such as those outlined within this disclosure.

An alternative remediation is to take the approach Amazon took for S3 bucket naming policies, which is to prevent Unicode in subdomains entirely by rejecting all buckets that begin with the punycode prefix, “xn--”.

 

Q&A

 

If someone has registered a homograph of one of my properties, what do I do?

Submit an abuse report to the appropriate organization.

Amazon: abuse@amazonaws.com
Google: https://support.google.com/code/contact/cloud_platform_report?hl=en
Wasabi: support@wasabi.com
DigitalOcean: https://www.digitalocean.com/company/contact/abuse/
Verisign: https://www.verisign.com/en_US/company-information/contact-us/index.xhtml

What about Latin dotless I? What about Latin small ligature OE?

These (and a few other IPA characters) can also be considered homoglyphs. However in my testing across varying platforms and font settings, those characters were far more discernible than the three IPA characters outlined in this disclosure. If these characters can be used to create lookalikes for your domain, by all means, register their permutations for your organization.


Which of the three characters is most "severe"?

The Voiced Velar Stop is by far the most convincing. On the majority of platforms and font settings, it is impossible to visually distinguish from its ASCII counterpart. The Latin Alpha is the second most convincing, especially if there are no ASCII “a” characters in the string. The Latin Iota is the least convincing of the group, and on many systems and fonts it can be discerned from its ASCII counterpart.

Nothing about this is new! Homoglyphs are a known problem. Why make noise about this?

It’s true that homoglyphs are a known problem. However, it’s not well-known that Latin Unicode IPA characters can be registered in Latin domains without violating the mixed-script rule. In my personal research prior to writing this post, I found no references that call out the ability to register domains with these IPA homoglyphs.

In my very recent discussions with Verisign, I was made aware of one publication by Mike Schiffman at Farsight Security which demonstrates this possibility on page 6 of the Global Internationalized Domain Name Homograph Report Q2 2018 document. The casual example of a homograph using a Latin Iota in that contact-form-gated PDF is the only reference to IPA homoglyphs I could find at the time of writing.

 

Which vendors have remediated this issue?

At the time of writing, only Amazon (S3) and Verisign. Verisign has deployed changes to gTLD registration rules to prevent domain registration for domains with these homoglyphs. I have provided feedback to ICANN for the current draft revisions to the Latin Label Generation Rules and the IDN Guidelines.

Were Verisign gTLDs the only affected TLDs?

No. Any TLD which allows Latin IPA characters is likely affected. However, the majority of the most popular sites on the internet use gTLDs, namely “.com”.

Why was the disclosure delayed after being reclassified as a zero-day?

Verisign requested additional time to establish mitigations for this vulnerability. Given the scope, criticality of remediation, and uniqueness of this issue, the “exceptional circumstance” clause of the vulnerability disclosure policy was invoked and Verisign was provided additional, delaying public disclosure. This course of action was deemed to be in the best interest of Internet users.

The tool you created has a GitHub link at the top, but the code is not public. Why?

The code will not be public for the first two weeks following this disclosure. This is intended to prevent attackers from using my code to automate the registration of homographs for domains which they do not own. In two weeks, the code will be made public. The code is written in Go compiled to WebAssembly and operates entirely in the browser--neither Soluble nor Bishop Fox are hosting any APIs for this tool. Verisign's API is used for domain availability tests.

The CSS is funky and looks weird on my phone or XYZ browser. Can you fix it?

I am not a web developer. In fact, I’m not a professional developer of any kind. This is my best attempt to come up with a useful tool in a very short timeframe. Pull requests will be welcome once the code is public.

 

Using IPA characters in domains is a legitimate use-case. Why should it be forbidden?

The domain “ɑlphɑ.com”, which uses the IPA Latin Alpha characters, is a clever and legitimate use of IPA characters. However, this is easily confusable with the pre-existing “alpha.com” domain. The likelihood of this being abused trumps the “legitimate use-case” argument.

What were your general thoughts about this disclosure?

This particular case was by-and-large a disappointment due to the unresponsiveness of vendors. Kudos to Amazon and Verisign who, in my view, were the only vendors to take this issue seriously and alter their policies in a timely manner to address this vulnerability.

Of organizations that did respond, what was the response?

After providing a clear explanation of the vulnerability and multiple exploitation scenarios, some organizations did not see this as an issue that necessitated a fix and simply accepted the risk. In discussions with two vendors, I was told something to the effect of:

> This is an acceptable risk and a known problem with DNS. Issues [like this] that come through bug-bounty are rejected.

Other vendors simply never responded after acknowledging receipt of the disclosure report and the zero-day reclassification.

Vendors need to take this issue seriously. Playing “whack-a-mole” with abuse reports is reactive and not preventative. Vendors should instead take a proactive approach and prevent domains and subdomains containing these characters from being registered.

Why was the issue re-classified as a zero-day?

After identifying multiple instances of HTTPS certificate logs through Certificate Transparency, as well as one “unofficial” JavaScript library hosted at a prominent domain, I reclassified this as a zero-day.

That you know of, how many homograph domains had HTTPS certificates generated between 2017 and the present?

15. Though this only applies to the approximately 300 prominent domains I tested.

How long has this vulnerability been abused?

Evidence suggests between 2017 and the present. That said, Certificate Transparency logs only became mandatory for all certificates in April, 2018.

Will you disclose which domains these were?

No.

Why have the number of active homograph domains and certificates trailed off in recent years? Was I a victim of this attack?

It is unlikely you were victimized by attackers using this technique. My speculation is that this vulnerability was only used in highly-targeted social engineering campaigns. I will further speculate that, based on the CT logs and recent browser changes in handling Unicode in URLs, abuse of this vulnerability was likely more prevalent 3+ years ago than it is today.

 

 

References

ICANN - Root Zone Label Generation Rules (RZ-LGR)

ICANN - RZ-LGR Process

ICANN - IDN Guidelines (current)

ICANN - IDN Guidelines (v4, in review)

 

 

Topics: zero-day, vulnerability, homograph, homoglyph

Matt Hamilton

Written by Matt Hamilton

Matt Hamilton (OSCP), is a principal security researcher at Soluble, where he focuses on Kubernetes security research. He was formerly with Bishop Fox, where he worked on black-box penetration testing, application assessments, source code review, and mobile application review for clients, which included large global organizations and high-tech start-ups . Matt is responsible for over a dozen CVEs. Matt is a founding member of OpenToAll, an online team for security competitions whose purpose is to mentor newcomers to the security community. He is a responsible disclosure advocate, and loves the Go programming language.