COMAR (Classification of COmpromised versus Maliciously Registered domains) is a joint research project by a consortium formed by SIDN Labs, AFNIC Labs, and Grenoble Alps University. The Franco-Dutch project will address the problem of automatically distinguishing between domain names registered by cybercriminals for the purpose of malicious activities, and domain names exploited through vulnerable web applications. The project is designed to help intermediaries such as registrars and ccTLD registries further optimize their anti-abuse processes.
Domain names are easy to use shorthands for IP addresses that help us navigate the many online services that we use in our daily lives. While the vast majority of domain name registration and use is benign, there are cybercriminals who unfortunately misuse them, for instance to launch large-scale phishing attacks, drive-by-downloads, and spam campaigns. Security organizations such as the Anti-Phishing Working Group (APWG) and Stop Badware collect information about these misused domain names and make it available to their customers (e.g., hosting providers and domain name registries) in the form of URL blacklists.
Both the operational and research communities distinguish mainly two types of domain name abuse: legitimate domains that criminals have compromised and new domain names that have been specifically registered for malicious purposes. An example of a compromised domain name is studentflats.gr, which is a legitimate site that ran a Wordpress installation and that cybercriminals hacked to host a banking-related phishing site. This is visible in the blacklisted URL (http://studentflats.gr/wp-content/uploads/2016/.co.nz/login/personal-banking/login/auth_security.php), which has an illegally installed banking script (/uploads/…/auth_security.php) underneath the Wordpress directory (/wp-content). An example of a maliciously registered domain name is continue-details.com, which was used for a Paypal phishing site. This is visible in the blacklisted URL (http://paypal.com.login.continue-details.com/), which does not explicitly contain a malicious program such as a PHP script, but instead refers to a site specifically set up for the phish using a 5th level domain name (continue-details.com being the first and second levels and paypal.com.login. adding three more levels).
The distinction between these two groups is critical because they require different mitigation actions by different intermediaries. For example, hosting providers together with webmasters typically concentrate on cleaning up the content of compromised websites, whereas domain registries (e.g., SIDN and Afnic) and registrars tend to focus on handling malicious domain name registrations.
From an operational point of view, intermediaries typically use URL blacklists in their security systems to automatically block malicious content. However, a compromised domain name requires a more fine-grained level of mitigation. For example, if an intermediary simply blocks studentflats.gr, then it will also block the legitimate part of the site (the content the Wordpress installation is serving to visitors). So instead what is needed for a security engineer is to look at the site’s Wordpress installation and specifically (or automatically) remove the malicious PHP script from the hosting platform. This example illustrates that it is crucial to unambiguously label domains of blacklisted URLs as compromised or maliciously registered so they can be reliably used by security systems.
The ultimate goal of COMAR is to develop a machine learning-based classifier that labels blacklisted domains as compromised or maliciously registered, then extensively evaluate its accuracy, and implement it for a production-level environment. We also plan to study the attackers’ profit-maximizing behavior and their business models. We shall apply our classifier to unlabeled domain names of URL blacklists, for example, to answer the following question: do attackers prefer to register malicious domains, compromise vulnerable websites, or misuse domains of legitimate services such as cloud-based file-sharing services in their criminal activities?
All three COMAR partners have extensive experience in the analysis of large heterogeneous datasets and in engineering the underlying platforms. Grenoble Alps University will concentrate on the statistical analysis of large-scale Internet measurement and incident data and publishing scientific papers, whereas both registry Labs will focus on advancing the COMAR classifier for operational environments (e.g., at SIDN and Afnic) and making it available to their stakeholders such as .nl and .fr registrars. The complementary approach of this partnership is in line with the need for registries to continuously reinforce their capacities and capabilities to increase the security levels of their Top-level Domains (TLDs) and ultimately provide enhanced levels of trust for end-users.
Sourena Maroofi, a Ph.D. student at Grenoble Alps University, will develop and evaluate the COMAR classifier under the supervision of Maciej Korczyński, COMAR’s Principal Investigator. COMAR, funded by SIDN and Afnic, will start in October 2018 and will last for three years. The steering committee of the project consists of Cristian Hesselman (SIDN Labs), Benoît Ampeau (Afnic Labs), and Maciej Korczyński (Drakkar team, Grenoble INP, Grenoble Alps University).
For more information, please contact the following steering committee members.
|Cristian Hesselman (Manager SIDN Labs)||cristian . hesselman AT sidn.nl||SIDN Labs|
|Benoît Ampeau (Director Partnerships & Innovations, Afnic Labs)||benoit . ampeau AT afnic.fr||Afnic Labs|
|Maciej Korczyński (Faculty member at GAU & PI of the COMAR project)||maciej . korczynski AT univ-grenoble-alpes .fr||Grenoble Alps University|