We are broadly interested in real-world security and privacy threats in healthcare and consumer technologies (see this video). We build systems to measure these threats at scale. Our techniques are heavily based on empirical measurements—hence our name, NYU mLab (also short for momoLab).
Our past research covers two broad areas:
We were contacted by various government agencies—e.g., FBI, FTC, and New York State Attorney General—to help with the investigations of a number of security and privacy threats related to our research. Also, our work was covered in multiple media outlets. Examples include:
Abstract: Many households include children who use voice personal assistants (VPA) such as Amazon Alexa. Children benefit from the rich functionalities of VPAs and third-party apps but are also exposed to new risks in the VPA ecosystem (e.g., inappropriate content or information collection). To study the risks VPAs pose to children, we build a Natural Language Processing (NLP)-based system to automatically interact with VPA apps and analyze the resulting conversations to identify contents risky to children. We identify 28 child-directed apps with risky contents and maintain a growing dataset of 31,966 non-overlapping app behaviors collected from 3,434 Alexa apps. Our findings suggest that although voice apps designed for children are subject to more policy requirements and intensive vetting, children are still vulnerable to risky content. We then conduct a user study showing that parents are more concerned about VPA apps with inappropriate content than those that ask for personal information, but many parents are not aware that risky apps of either type exist. Finally, we identify a new threat to users of VPA apps: confounding utterances, or voice commands shared by multiple apps that may cause a user to invoke or interact with a different app than intended. We identify 4,487 confounding utterances, including 581 shared by child-directed and non-child-directed apps.
See also: Project Website
Abstract: Voice User Interfaces (VUIs) are increasingly common on many Internet of Things (IoT) devices. Amazon has the highest share in the voice-assistant market and supports more than 47,000 third-party applications (“skills”) on its Alexa platform to extend functionality. We study how Alexa’s design decisions when integrating these skills may create unintended security and privacy risks. Our survey of 237 participants finds that users do not understand these skills are often operated by third parties. Additionally, people often confuse third-party skills with native Alexa functions. Finally, they are unaware of the functions that the native Alexa system supports. These misunderstandings may allow attackers to develop third-party skills that operate without users’ knowledge, or even to masquerade as native Alexa functions, posing new threats to user security and privacy. Based on our survey data, we make design recommendations, including visual and audio feedback, to help users distinguish native and third-party skills.
Abstract: The proliferation of smart home devices has created new opportunities for empirical research in ubiquitous computing, ranging from security and privacy to personal health. Yet, data from smart home deployments are hard to come by, and existing empirical studies of smart home devices typically involve only a small number of devices in lab settings. To contribute to data-driven smart home research, we crowdsource the largest known dataset of labeled network traffic from smart home devices from within real-world home networks. To do so, we developed and released IoT Inspector, an open-source tool that allows users to observe the traffic from smart home devices on their own home networks. Between April 10, 2019 and January 21, 2020, 5,404 users installed IoT Inspector, allowing us to collect labeled network traffic from 54,094 smart home devices. At the time of publication, IoT Inspector is still gaining users and collecting data from more devices. We demonstrate how this data enables new research into smart homes through two case studies focused on security and privacy. First, we find that many device vendors, including Amazon and Google, use outdated TLS versions and send unencrypted traffic, sometimes to advertising and tracking services. Second, we discover that smart TVs from at least 10 vendors communicated with advertising and tracking services. Finally, we find widespread cross-border communications, sometimes unencrypted, between devices and Internet services that are located in countries with potentially poor privacy practices. To facilitate future reproducible research in smart homes, we will release the IoT Inspector data to the public.
Abstract: The number of Internet connected TV devices has grown significantly in recent years, especially Over-the-Top ("OTT") streaming devices, such as Roku TV and Amazon Fire TV .OTT devices offer an alternative to multi-channel television subscription services and are often monetized through behavioral advertising.To shed light on the privacy practices of such platforms, we developed a system that can automatically download OTT apps (also known as channels) and interact with them while intercepting the network traffic and perform best-effort TLS interception. We used this smart crawler to visit more than 2,000 channels on two popular OTT platforms, namely Roku and Amazon Fire TV. Our results show that tracking is pervasive on both OTT platforms and traffic to known trackers is present on 69% of Roku channels and 89% of Amazon Fire TV channels. We also discover widespread practice of collecting and transmitting unique identifiers including WiFi MAC addresses and SSIDs. Moreover, a large number of trackers send data over unencrypted channels, potentially exposing it to malicious eavesdroppers. Finally we show that the countermeasures available for these devices, such as limiting ad tracking options and adblocking, are practically ineffective. Based on our findings, we make recommendations for researchers, regulators, policy makers, platform and app developers.
See also: Blog Post
Abstract: The proliferation of smart home Internet of Things (IoT) devices presents unprecedented challenges for preserving privacy within the home. In this paper, we demonstrate that a passive network observer (e.g., an Internet service provider) can infer private in-home activities by analyzing Internet traffic from commercially available smart home devices even when the devices use end-to-end transport-layer encryption. We evaluate common approaches for defending against these types of traffic analysis attacks, including firewalls, virtual private networks, and independent link padding, and find that none sufficiently conceal user activities with reasonable data overhead. We develop a new defense, "stochastic traffic padding" (STP), that makes it difficult for a passive network adversary to reliably distinguish genuine user activities from generated traffic patterns designed to look like user interactions. Our analysis provides a theoretical bound on an adversary's ability to accurately detect genuine user activities as a function of the amount of additional cover traffic generated by the defense technique.
See also: Blog Post
Abstract: We consider the problem of regulating products with negative externalities to a third party that is neither the buyer nor the seller, but where both the buyer and seller can take steps to mitigate the externality. The motivating example to have in mind is the sale of Internet-of-Things (IoT) devices, many of which have historically been compromised for DDoS attacks that disrupted Internet-wide services such as Twitter Brian Krebs (2017); Nicky Woolf (2016). Neither the buyer (i.e., consumers) nor seller (i.e., IoT manufacturers) was known to suffer from the attack, but both have the power to expend effort to secure their devices. We consider a regulator who regulates payments (via fines if the device is compromised, or market prices directly), or the product directly via mandatory security requirements.
Both regulations come at a cost—implementing security requirements increases production costs, and the existence of fines decreases consumers’ values—thereby reducing the seller’s profits. The focus of this paper is to understand the efficiency of various regulatory policies. That is, policy A is more efficient than policy B if A more successfully minimizes negatives externalities, while both A and B reduce seller’s profits equally.
We develop a simple model to capture the impact of regulatory policies on a buyer’s behavior. In this model, we show that for homogeneous markets—where the buyer’s ability to follow security practices is always high or always low—the optimal (externality-minimizing for a given profit constraint) regulatory policy need regulate only payments or production. In arbitrary markets, by contrast, we show that while the optimal policy may require regulating both aspects, there is always an approximately optimal policy which regulates just one.
Abstract: In this paper, we present two web-based attacks against local IoT devices that any malicious web page third-party script can perform, even when the devices are behind NATs. In our attack scenario, a victim visits the attacker’s website, which contains a malicious script that communicates with IoT devices on the local network that have open HTTP servers. We show how the malicious script can circumvent the same-origin policy by exploiting error messages on the HTML5 MediaError interface or by carrying out DNS rebinding attacks. We demonstrate that the attacker can gather sensitive information from the devices (e.g., unique device identifiers and precise geolocation), track and profile the owners to serve ads, or control the devices by playing arbitrary videos and rebooting. We propose potential countermeasures to our attacks that users, browsers, DNS providers, and IoT vendors can implement.
Abstract: Ransomware is a type of malware that encrypts the files of infected hosts and demands payment, often in a cryptocurrency such as Bitcoin. In this paper, we create a measurement framework that we use to perform a large-scale, two-year, end-to-end measurement of ransomware payments, victims, and operators. By combining an array of data sources, including ransomware binaries, seed ransom payments, victim telemetry from infections, and a large database of Bitcoin addresses annotated with their owners, we sketch the outlines of this burgeoning ecosystem and associated third-party infrastructure. In particular, we trace the financial transactions, from the moment victims acquire bitcoins, to when ransomware operators cash them out. We find that many ransomware operators cashed out using BTC-e, a now-defunct Bitcoin exchange. In total we are able to track over $16 million in likely ransom payments made by 19,750 potential victims during a two-year period. While our study focuses on ransomware, our methods are potentially applicable to other cybercriminal operations that have similarly adopted Bitcoin as their payment channel.
Abstract: Digital currencies have flourished in recent years, buoyed by the tremendous success of Bitcoin. These blockchain-based currencies, called altcoins, are associated with a few thousand to millions of dollars of market capitalization. Altcoins have attracted enthusiasts who enter the market by mining or buying them, but the risks and rewards could potentially be significant, especially when the market is volatile. In this work, we estimate the potential profitability of mining and speculating 18 altcoins using real-world blockchain and trade data. Using opportunity cost as a metric, we estimate the mining cost for an altcoin with respect to a more popular but stable coin. For every dollar invested in mining or buying a coin, we compute the potential returns under various conditions, such as time of market entry and hold positions. While some coins offer the potential for spectacular returns, many follow a simple bubble-and-crash scenario, which highlights the extreme risks—and potential gains—in altcoin markets.
Abstract: Sites for online classified ads selling sex are widely used by human traffickers to support their pernicious business. The sheer quantity of ads makes manual exploration and analysis unscalable. In addition, discerning whether an ad is advertising a trafficked victim or a independent sex worker is a very difficult task. Very little concrete ground truth (i.e., ads definitively known to be posted by a trafficker) exists in this space. In this work, we develop tools and techniques that can be used separately and in conjunction to group sex ads by their true owner (and not the claimed author in the ad). Specifically, we develop a machine learning classifier that uses stylometry to distinguish between ads posted by the same vs. different authors with 96% accuracy. We also design a linking technique that takes advantage of leakages from the Bitcoin mempool, blockchain and sex ad site, to link a subset of sex ads to Bitcoin public wallets and transactions. Finally, we demonstrate via a 4-week proof of concept using Backpage as the sex ad site, how an analyst can use these automated approaches to potentially find human traffickers.
Abstract: In this paper, we investigate a new form of blackhat search engine optimization that targets local listing services like Google Maps. Miscreants register abusive business listings in an attempt to siphon search traffic away from legitimate businesses and funnel it to deceptive service industries---such as unaccredited locksmiths---or to traffic-referral scams, often for the restaurant and hotel industry. In order to understand the prevalence and scope of this threat, we obtain access to over a hundred-thousand business listings on Google Maps that were suspended for abuse. We categorize the types of abuse affecting Google Maps; analyze how miscreants circumvented the protections against fraudulent business registration such as postcard mail verification; identify the volume of search queries affected; and ultimately explore how miscreants generated a profit from traffic that necessitates physical proximity to the victim. This physical requirement leads to unique abusive behaviors that are distinct from other online fraud such as pharmaceutical and luxury product scams.
See also: Slides
Abstract: In this paper, we present an empirical study of a recent spam campaign (a “stress test”) that resulted in a DoS attack on Bitcoin. The goal of our investigation being to understand the methods spammers used and impact on Bitcoin users. To this end, we used a clustering based method to detect spam transactions. We then validate the clustering results and generate a conservative estimate that 385,256 (23.41 %) out of 1,645,667 total transactions were spam during the 10 day period at the peak of the campaign. We show the impact of increasing non-spam transaction fees from 45 to 68 Satoshis/byte (from $0.11 to $0.17 USD per kilobyte of transaction) on average, and increasing delays in processing non-spam transactions from 0.33 to 2.67 h on average, as well as estimate the cost of this spam attack at 201 BTC (or $49,000 USD). We conclude by pointing out changes that could be made to Bitcoin transaction fees that would mitigate some of the spam techniques used to effectively DoS Bitcoin.
Abstract: At the current stratospheric value of Bitcoin, miners with access to significant computational horsepower are literally printing money. For example, the first operator of a USD $1,500 custom ASIC mining platform claims to have recouped his investment in less than three weeks in early February 2013, and the value of a bitcoin has more than tripled since then. Not surprisingly, cybercriminals have also been drawn to this potentially lucrative endeavor, but instead are leveraging the resources available to them: stolen CPU hours in the form of botnets. We conduct the first comprehensive study of Bitcoin mining malware, and describe the infrastructure and mechanism deployed by several major players. By carefully reconstructing the Bitcoin transaction records, we are able to deduce the amount of money a number of mining botnets have made.
Current role: I am an Assistant Professor at New York University's Tandon School of Engineering. I am a part of the Electrical and Computer Engineering Department and Center for Urban Science + Progress. I am also affiliated with NYU's Center for Cybersecurity, Computer Science and Engineering Department, and Center for Data Science.
Past research experience: Before joining NYU, I was a a postdoctoral fellow at Princeton University advised by Prof. Nick Feamster (who recently moved to University of Chicago). I was affiliated with Princeton's Center for Information Technology Policy and Department of Computer Science.
I obtained my PhD in Computer Science from University of California, San Diego, advised by Prof. Alex C. Snoeren and Prof. Kirill Levchenko (who recently moved to UIUC). My PhD dissertation uses cryptocurrencies to measure financial activities of malicious actors and to uncover potential identities of these actors.
I graduated from Williams College (Massachusetts) with a BA in Computer Science, advised by Prof. Jeannie Albrecht. At Williams, I also directed a series of Chinese cooking shows on Williamstown Community Television.
Why is it called NYU “mLab”? One of my long-term collaborators is momo (pictured below), who constantly travels with me for work and for leisure. She is the Supreme Director of mLab—short for momoLab.