If you are interested in working together (especially at the intersection of HCI, security, and privacy), join us!
We are broadly interested in real-world security and privacy threats in healthcare and consumer technologies (see this video). We build systems to measure these threats at scale. Our techniques are heavily based on empirical measurements—hence our name, NYU mLab (also short for momoLab).
Our past research covers two broad areas:
We were contacted by various government agencies—e.g., FBI, FTC, and New York State Attorney General—to help with the investigations of a number of security and privacy threats related to our research. Also, our work was covered in multiple media outlets. Examples include:
especially for users and non-users in home or school networks
and vulnerable communities, e.g., victims of intimate partner violence
Abstract: The network communication between Internet of Things (IoT) devices on the same local network has significant implications for security, privacy, and correctness. Yet, local network traffic has been largely ignored by prior literature, which typically focuses on studying the communication between devices and wide-area endpoints or detecting vulnerable IoT devices exposed to the Internet. In this paper, we present a comprehensive measurement study to shed light on the local communication within a smart home deployment and its associated threats. We use a unique combination of passive network traffic captures, honeypot interactions, and crowdsourced data from participants to identify a wide range of device activities on the local network. We then analyze these diverse datasets to characterize local network protocols, security and privacy threats associated with them, and real examples of information exposure due to local IoT traffic. Our analysis reveals vulnerable devices and insecure network protocols, how sensitive network and device data is exposed in the local network, and how this is abused by malicious actors and even exfiltrated to remote servers, potentially for tracking purposes. We will make our datasets and analysis publicly available to support further research in this area.
Abstract: IoT devices are increasingly used in consumer homes. Despite recent works in characterizing IoT TLS usage for a limited number of in-lab devices, there exists a gap in quantitatively understanding TLS behaviors from devices in the wild and server-side certificate management. To bridge this knowledge gap, we conduct a new measurement study by focusing on the practice of *device vendors*, through a crowdsourced dataset of network traffic from 2,014 real-world IoT devices across 721 global users. Through a new approach by identifying the sharing of TLS fingerprints across vendors and across devices, we uncover the prevalent use of customized TLS libraries (i.e., not matched to any known TLS libraries) and potential security concerns resulting from co-located TLS stacks of different services. Furthermore, we present the first known study on server-side certificate management for servers contacted by IoT devices. Our study highlights potential concerns in the TLS/PKI practice by IoT device vendors. We aim to raise visibility for these issues and motivate vendors to improve security practice.
Abstract: In intimate partner surveillance (IPS), abusers regularly leverage technology to spy on and stalk their partners. Recently, a new threat has emerged for IPS survivors: abusers have begun to use covert spy devices such as nanny cameras, item trackers, and audio recorders to enact IPS. To date, we lack an understanding of the prevalence and characteristics of these spy devices, making it difficult to prevent this form of IPS. We observe that many such spy devices can be found on mainstream retailers. Thus, in this work, we perform a systematic survey of spy devices sold through popular US retailers. By gathering 2,228 commercial spy devices and analyzing a representative sample, we find that not only can these devices be used for IPS, but many of them are advertised for use in IPS and other covert surveillance. One would hope that commercial and academic tools to detect hidden devices are similarly effective and could provide a solution for victims of covert spying. Unfortunately, through laboratory experiments, we find that commercial detection tools are all but defective, while recent academic detection systems require much refinement before they can be useful to survivors. We conclude with a call to action for the security community and outline reactive and preventative solutions for IPS via spy devices.
See also: Poster
Abstract: Tech-enabled interpersonal abuse (IPA) is a pervasive problem. Abusers, often intimate partners, use tools such as spyware to surveil and harass victims. Unfortunately, anecdotal evidence suggests that smart, Internet-connected devices such as home thermostats, cameras, and tracking tags may similarly be used against victims of IPA. To tackle abuse involving smart devices, it is vital that we understand the ecosystem of smart devices that enable IPA. Thus, in this work, we conduct the first large-scale qualitative analysis of the smart devices used in IPA. We systematically crawl Google Search results to uncover web pages discussing how abusers use smart devices to enact IPA. Through our analysis, we identify 32 devices used for IPA and detail the varied strategies abusers use for spying and harassment via these devices. Then, we devise a framework which divides IoT-enabled IPA into four overarching vectors---Covert Spying, Unauthorized Access, Repurposing, and Legitimate Use---that outline a path for future work. Finally, We discuss key challenges we, as a research community, must solve to prevent smart device-enabled abuse.
Abstract: Literature suggests the unmatched or conflicting privacy needs between users and bystanders in smart homes due to their different privacy concerns and priorities. A promising approach to mitigate such conflicts is through negotiation. Yet, it is not clear whether bystanders have privacy negotiation needs and if so, what factors may influence their negotiation intention and how to better support the negotiation to achieve their privacy goals. To answer these questions, we conducted a vignette study that varied across three categorical factors, including device types, device location, and duration of stay with 867 participants in the context of Airbnb. We further examined our participants' preferences regarding with whom, when, how, and why they would like to negotiate their privacy. Our findings showed that device type remained the only factor that significantly influenced our participants' negotiation intention. Additionally, we found our participants' other preferences, such as they preferred to contact Airbnb hosts first to convey their privacy needs through asynchronous channels (e.g., messages and emails). We summarized design implications to fulfill tenants' privacy negotiation needs.
See also: EdSurge News
Abstract: Increased use of technology in schools raises new privacy and security challenges for K-12 students---and harms such as commercialization of student data, exposure of student data in security breaches, and expanded tracking of students---but the extent of these challenges is unclear. In this paper, first, we interviewed 18 school officials and personnel to understand what educational technologies districts use and how they manage student privacy and security around these technologies. Second, to determine if these educational technologies are frequently endorsed across United States (US) public schools, we compiled a list of linked educational technology websites scraped from \domains K-12 public school/district domains and analyzed them for privacy risks. Our findings suggest that administrators lack resources to properly assess privacy and security issues around educational technologies even though they do pose potential privacy issues. Based on these findings, we make recommendations for policymakers, educators, and the CHI research community.
Abstract: Users face various privacy risks in smart homes, yet there are limited ways for them to learn about the details of such risks, such as the data practices of smart home devices and their data flow. In this paper, we present Privacy Plumber, a system that enables a user to inspect and explore the privacy "leaks" in their home using an augmented reality tool. Privacy Plumber allows the user to learn and understand the volume of data leaving the home and how that data may affect a user's privacy -- in the same physical context as the devices in question, because we visualize the privacy leaks with augmented reality. Privacy Plumber uses ARP spoofing to gather aggregate network traffic information and presents it through an overlay on top of the device in an smartphone app. The increased transparency aims to help the user make privacy decisions and mend potential privacy leaks, such as instruct Privacy Plumber on what devices to block, on what schedule (i.e., turn off Alexa when sleeping), etc. Our initial user study with six participants demonstrates participants' increased awareness of privacy leaks in smart devices, which further contributes to their privacy decisions (e.g., which devices to block).
Abstract: Smart home IoT devices are known to be breeding grounds for security and privacy vulnerabilities. Although some IoT vendors deploy updates, the update process is mostly opaque to researchers. It is unclear what software components are on devices, whether and when these components are updated, and how vulnerabilities change alongside the updates. This opaqueness makes it difficult to understand the security of software supply chains of IoT devices. To understand the software update practices on IoT devices, we leverage IoT Inspector's dataset of network traffic from real-world IoT devices. We analyze the User Agent strings from plain-text HTTP connections. We focus on four software components included in User Agents: cURL, Wget, OkHttp, and python-requests. By keeping track of what kinds of devices have which of these components at what versions, we find that many IoT devices potentially used outdated and vulnerable versions of these components---based on the User Agents---even though less vulnerable, more updated versions were available; and that the rollout of updates tends to be slow for some IoT devices.
Abstract: We developed IoT Inspector, an open-source tool that allows the owners' of smart home devices to monitor those devices' network traffic and discover potential security and privacy risks. In this article, we discuss some of what we discovered but also the problems we are facing in collecting reliable and accurate data.
Abstract: The opaque data practices in smart home devices have raised significant privacy concerns for smart home users and bystanders. One way to learn about the data practices is through privacy-related notifications. However, how to deliver these notifications to users and bystanders and increase their awareness of data practices is not clear. We surveyed 136 users and 123 bystanders to understand their preferences of receiving privacy-related notifications in smart homes. We further collected their responses to four mechanisms that improve privacy awareness (e.g., Data Dashboard) as well as their selections of mechanisms in four different scenarios (e.g., friend visiting ). Our results showed the pros and cons of each privacy awareness mechanism, e.g., Data Dashboard can help reduce bystanders' dependence on users. We also found some unique benefits of each mechanism (e.g., Ambient Light could provide unobtrusive privacy awareness). We summarized four key design dimensions for future privacy awareness mechanisms design.
Abstract: Many households include children who use voice personal assistants (VPA) such as Amazon Alexa. Children benefit from the rich functionalities of VPAs and third-party apps but are also exposed to new risks in the VPA ecosystem (e.g., inappropriate content or information collection). To study the risks VPAs pose to children, we build a Natural Language Processing (NLP)-based system to automatically interact with VPA apps and analyze the resulting conversations to identify contents risky to children. We identify 28 child-directed apps with risky contents and maintain a growing dataset of 31,966 non-overlapping app behaviors collected from 3,434 Alexa apps. Our findings suggest that although voice apps designed for children are subject to more policy requirements and intensive vetting, children are still vulnerable to risky content. We then conduct a user study showing that parents are more concerned about VPA apps with inappropriate content than those that ask for personal information, but many parents are not aware that risky apps of either type exist. Finally, we identify a new threat to users of VPA apps: confounding utterances, or voice commands shared by multiple apps that may cause a user to invoke or interact with a different app than intended. We identify 4,487 confounding utterances, including 581 shared by child-directed and non-child-directed apps.
Abstract: Voice User Interfaces (VUIs) are increasingly common on many Internet of Things (IoT) devices. Amazon has the highest share in the voice-assistant market and supports more than 47,000 third-party applications (“skills”) on its Alexa platform to extend functionality. We study how Alexa’s design decisions when integrating these skills may create unintended security and privacy risks. Our survey of 237 participants finds that users do not understand these skills are often operated by third parties. Additionally, people often confuse third-party skills with native Alexa functions. Finally, they are unaware of the functions that the native Alexa system supports. These misunderstandings may allow attackers to develop third-party skills that operate without users’ knowledge, or even to masquerade as native Alexa functions, posing new threats to user security and privacy. Based on our survey data, we make design recommendations, including visual and audio feedback, to help users distinguish native and third-party skills.
Abstract: The proliferation of smart home devices has created new opportunities for empirical research in ubiquitous computing, ranging from security and privacy to personal health. Yet, data from smart home deployments are hard to come by, and existing empirical studies of smart home devices typically involve only a small number of devices in lab settings. To contribute to data-driven smart home research, we crowdsource the largest known dataset of labeled network traffic from smart home devices from within real-world home networks. To do so, we developed and released IoT Inspector, an open-source tool that allows users to observe the traffic from smart home devices on their own home networks. Between April 10, 2019 and January 21, 2020, 5,404 users installed IoT Inspector, allowing us to collect labeled network traffic from 54,094 smart home devices. At the time of publication, IoT Inspector is still gaining users and collecting data from more devices. We demonstrate how this data enables new research into smart homes through two case studies focused on security and privacy. First, we find that many device vendors, including Amazon and Google, use outdated TLS versions and send unencrypted traffic, sometimes to advertising and tracking services. Second, we discover that smart TVs from at least 10 vendors communicated with advertising and tracking services. Finally, we find widespread cross-border communications, sometimes unencrypted, between devices and Internet services that are located in countries with potentially poor privacy practices. To facilitate future reproducible research in smart homes, we will release the IoT Inspector data to the public.
Abstract: The number of Internet connected TV devices has grown significantly in recent years, especially Over-the-Top ("OTT") streaming devices, such as Roku TV and Amazon Fire TV .OTT devices offer an alternative to multi-channel television subscription services and are often monetized through behavioral advertising.To shed light on the privacy practices of such platforms, we developed a system that can automatically download OTT apps (also known as channels) and interact with them while intercepting the network traffic and perform best-effort TLS interception. We used this smart crawler to visit more than 2,000 channels on two popular OTT platforms, namely Roku and Amazon Fire TV. Our results show that tracking is pervasive on both OTT platforms and traffic to known trackers is present on 69% of Roku channels and 89% of Amazon Fire TV channels. We also discover widespread practice of collecting and transmitting unique identifiers including WiFi MAC addresses and SSIDs. Moreover, a large number of trackers send data over unencrypted channels, potentially exposing it to malicious eavesdroppers. Finally we show that the countermeasures available for these devices, such as limiting ad tracking options and adblocking, are practically ineffective. Based on our findings, we make recommendations for researchers, regulators, policy makers, platform and app developers.
See also: Blog Post
Abstract: The proliferation of smart home Internet of Things (IoT) devices presents unprecedented challenges for preserving privacy within the home. In this paper, we demonstrate that a passive network observer (e.g., an Internet service provider) can infer private in-home activities by analyzing Internet traffic from commercially available smart home devices even when the devices use end-to-end transport-layer encryption. We evaluate common approaches for defending against these types of traffic analysis attacks, including firewalls, virtual private networks, and independent link padding, and find that none sufficiently conceal user activities with reasonable data overhead. We develop a new defense, "stochastic traffic padding" (STP), that makes it difficult for a passive network adversary to reliably distinguish genuine user activities from generated traffic patterns designed to look like user interactions. Our analysis provides a theoretical bound on an adversary's ability to accurately detect genuine user activities as a function of the amount of additional cover traffic generated by the defense technique.
See also: Blog Post
Abstract: We consider the problem of regulating products with negative externalities to a third party that is neither the buyer nor the seller, but where both the buyer and seller can take steps to mitigate the externality. The motivating example to have in mind is the sale of Internet-of-Things (IoT) devices, many of which have historically been compromised for DDoS attacks that disrupted Internet-wide services such as Twitter Brian Krebs (2017); Nicky Woolf (2016). Neither the buyer (i.e., consumers) nor seller (i.e., IoT manufacturers) was known to suffer from the attack, but both have the power to expend effort to secure their devices. We consider a regulator who regulates payments (via fines if the device is compromised, or market prices directly), or the product directly via mandatory security requirements.
Both regulations come at a cost—implementing security requirements increases production costs, and the existence of fines decreases consumers’ values—thereby reducing the seller’s profits. The focus of this paper is to understand the efficiency of various regulatory policies. That is, policy A is more efficient than policy B if A more successfully minimizes negatives externalities, while both A and B reduce seller’s profits equally.
We develop a simple model to capture the impact of regulatory policies on a buyer’s behavior. In this model, we show that for homogeneous markets—where the buyer’s ability to follow security practices is always high or always low—the optimal (externality-minimizing for a given profit constraint) regulatory policy need regulate only payments or production. In arbitrary markets, by contrast, we show that while the optimal policy may require regulating both aspects, there is always an approximately optimal policy which regulates just one.
Abstract: In this paper, we present two web-based attacks against local IoT devices that any malicious web page third-party script can perform, even when the devices are behind NATs. In our attack scenario, a victim visits the attacker’s website, which contains a malicious script that communicates with IoT devices on the local network that have open HTTP servers. We show how the malicious script can circumvent the same-origin policy by exploiting error messages on the HTML5 MediaError interface or by carrying out DNS rebinding attacks. We demonstrate that the attacker can gather sensitive information from the devices (e.g., unique device identifiers and precise geolocation), track and profile the owners to serve ads, or control the devices by playing arbitrary videos and rebooting. We propose potential countermeasures to our attacks that users, browsers, DNS providers, and IoT vendors can implement.
understanding the activities of cybercriminals
and their financial transactions/motivations
Abstract: Ransomware is a type of malware that encrypts the files of infected hosts and demands payment, often in a cryptocurrency such as Bitcoin. In this paper, we create a measurement framework that we use to perform a large-scale, two-year, end-to-end measurement of ransomware payments, victims, and operators. By combining an array of data sources, including ransomware binaries, seed ransom payments, victim telemetry from infections, and a large database of Bitcoin addresses annotated with their owners, we sketch the outlines of this burgeoning ecosystem and associated third-party infrastructure. In particular, we trace the financial transactions, from the moment victims acquire bitcoins, to when ransomware operators cash them out. We find that many ransomware operators cashed out using BTC-e, a now-defunct Bitcoin exchange. In total we are able to track over $16 million in likely ransom payments made by 19,750 potential victims during a two-year period. While our study focuses on ransomware, our methods are potentially applicable to other cybercriminal operations that have similarly adopted Bitcoin as their payment channel.
Abstract: Digital currencies have flourished in recent years, buoyed by the tremendous success of Bitcoin. These blockchain-based currencies, called altcoins, are associated with a few thousand to millions of dollars of market capitalization. Altcoins have attracted enthusiasts who enter the market by mining or buying them, but the risks and rewards could potentially be significant, especially when the market is volatile. In this work, we estimate the potential profitability of mining and speculating 18 altcoins using real-world blockchain and trade data. Using opportunity cost as a metric, we estimate the mining cost for an altcoin with respect to a more popular but stable coin. For every dollar invested in mining or buying a coin, we compute the potential returns under various conditions, such as time of market entry and hold positions. While some coins offer the potential for spectacular returns, many follow a simple bubble-and-crash scenario, which highlights the extreme risks—and potential gains—in altcoin markets.
Abstract: Sites for online classified ads selling sex are widely used by human traffickers to support their pernicious business. The sheer quantity of ads makes manual exploration and analysis unscalable. In addition, discerning whether an ad is advertising a trafficked victim or a independent sex worker is a very difficult task. Very little concrete ground truth (i.e., ads definitively known to be posted by a trafficker) exists in this space. In this work, we develop tools and techniques that can be used separately and in conjunction to group sex ads by their true owner (and not the claimed author in the ad). Specifically, we develop a machine learning classifier that uses stylometry to distinguish between ads posted by the same vs. different authors with 96% accuracy. We also design a linking technique that takes advantage of leakages from the Bitcoin mempool, blockchain and sex ad site, to link a subset of sex ads to Bitcoin public wallets and transactions. Finally, we demonstrate via a 4-week proof of concept using Backpage as the sex ad site, how an analyst can use these automated approaches to potentially find human traffickers.
Abstract: In this paper, we investigate a new form of blackhat search engine optimization that targets local listing services like Google Maps. Miscreants register abusive business listings in an attempt to siphon search traffic away from legitimate businesses and funnel it to deceptive service industries---such as unaccredited locksmiths---or to traffic-referral scams, often for the restaurant and hotel industry. In order to understand the prevalence and scope of this threat, we obtain access to over a hundred-thousand business listings on Google Maps that were suspended for abuse. We categorize the types of abuse affecting Google Maps; analyze how miscreants circumvented the protections against fraudulent business registration such as postcard mail verification; identify the volume of search queries affected; and ultimately explore how miscreants generated a profit from traffic that necessitates physical proximity to the victim. This physical requirement leads to unique abusive behaviors that are distinct from other online fraud such as pharmaceutical and luxury product scams.
See also: Slides
Abstract: In this paper, we present an empirical study of a recent spam campaign (a “stress test”) that resulted in a DoS attack on Bitcoin. The goal of our investigation being to understand the methods spammers used and impact on Bitcoin users. To this end, we used a clustering based method to detect spam transactions. We then validate the clustering results and generate a conservative estimate that 385,256 (23.41 %) out of 1,645,667 total transactions were spam during the 10 day period at the peak of the campaign. We show the impact of increasing non-spam transaction fees from 45 to 68 Satoshis/byte (from $0.11 to $0.17 USD per kilobyte of transaction) on average, and increasing delays in processing non-spam transactions from 0.33 to 2.67 h on average, as well as estimate the cost of this spam attack at 201 BTC (or $49,000 USD). We conclude by pointing out changes that could be made to Bitcoin transaction fees that would mitigate some of the spam techniques used to effectively DoS Bitcoin.
Abstract: At the current stratospheric value of Bitcoin, miners with access to significant computational horsepower are literally printing money. For example, the first operator of a USD $1,500 custom ASIC mining platform claims to have recouped his investment in less than three weeks in early February 2013, and the value of a bitcoin has more than tripled since then. Not surprisingly, cybercriminals have also been drawn to this potentially lucrative endeavor, but instead are leveraging the resources available to them: stolen CPU hours in the form of botnets. We conduct the first comprehensive study of Bitcoin mining malware, and describe the infrastructure and mechanism deployed by several major players. By carefully reconstructing the Bitcoin transaction records, we are able to deduce the amount of money a number of mining botnets have made.
Abstract: Software defined networks (SDNs) depart from traditional network architectures by explicitly allowing third-party software access to the network's control plane. Thus, SDN protocols such as OpenFlow give network operators the ability to innovate by authoring or buying network controller software independent of the hardware. However, this split design can make planning and designing large SDNs even more challenging than traditional networks. While existing network emulators allow operators to ascertain the behavior of traditional networks when subjected to a given workload, we find that current approaches fail to account for significant vendor-specific artifacts in the SDN switch control path. We benchmark OpenFlow-enabled switches from three vendors and illustrate how differences in their implementation dramatically impact latency and throughput. We present a measurement methodology and emulator extension to reproduce these control-path performance artifacts, restoring the fidelity of emulation.
Current role: I am an Assistant Professor at New York University's Tandon School of Engineering. I am a part of the Electrical and Computer Engineering Department and Center for Urban Science + Progress. I am also affiliated with NYU's Center for Cybersecurity, Computer Science and Engineering Department, and Center for Data Science.
Past research experience: Before joining NYU, I was a a postdoctoral fellow at Princeton University advised by Prof. Nick Feamster (who recently moved to University of Chicago). I was affiliated with Princeton's Center for Information Technology Policy and Department of Computer Science.
I obtained my PhD in Computer Science from University of California, San Diego, advised by Prof. Alex C. Snoeren and Prof. Kirill Levchenko (who recently moved to UIUC). My PhD dissertation uses cryptocurrencies to measure financial activities of malicious actors and to uncover potential identities of these actors.
I graduated from Williams College (Massachusetts) with a BA in Computer Science, advised by Prof. Jeannie Albrecht. At Williams, I also directed a series of Chinese cooking shows on Williamstown Community Television.
Why is it called NYU “mLab”? One of my long-term collaborators is momo (pictured below), who constantly travels with me for work and for leisure. She is the Supreme Director of mLab—short for momoLab.