D. Piorkowski, S. Park, A. Y. Wang, D. Wang, M. Muller and F. Portnoy. "How AI Developers Overcome Communication Challenges in a Multidisciplinary Team: A Case Study," ACM Conf. on Computer-Supported Cooperative Work and Social Computing (CSCW), 23 pages, 2021 [abstract]
Abstract: The development of AI applications is a multidisciplinary effort, involving multiple roles collaborating with the AI developers, an umbrella term we use to include data scientists and other AI-adjacent roles on the same team. During these collaborations, there is a knowledge mismatch between AI developers, who are skilled in data science, and external stakeholders who are typically not. This difference leads to communication gaps, and the onus falls on AI developers to explain data science concepts to their collaborators. In this paper, we report on a study including analyses of both interviews with AI developers and artifacts they produced for communication. Using the analytic lens of shared mental models, we report on the types of communication gaps that AI developers face, how AI developers communicate across disciplinary and organizational boundaries, and how they simultaneously manage issues regarding trust and expectations.
S. Park, A. Wang, B. Kawas, Q. V. Liao, D. Piorkowski and Marina Danilevsky. "Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models," ACM Conf. on Intelligent User Interfaces (IUI), 18 pages, 2021 [abstract]
Abstract: Data scientists face a steep learning curve in understanding a new domain for which they want to build machine learning (ML) models. While input from domain experts could offer valuable help, such input is often limited, expensive, and generally not in a form readily consumable by a model development pipeline. In this paper, we propose Ziva, a framework to guide domain experts in sharing essential domain knowledge to data scientists for building NLP models. With Ziva, experts are able to distill and share their domain knowledge using domain concept extractors and five types of label justification over a representative data sample. The design of Ziva is informed by preliminary interviews with data scientists, in order to understand current practices of domain knowledge acquisition process for ML development projects. To assess our design, we run a mix-method case-study to evaluate how Ziva can facilitate interaction of domain experts and data scientists. Our results highlight that (1) domain experts are able to use Ziva to provide rich domain knowledge, while maintaining low mental load and stress levels; and (2) data scientists find Ziva's output helpful for learning essential information about the domain, offering scalability of information, and lowering the burden on domain experts to share knowledge. We conclude this work by experimenting with building NLP models using the Ziva output by our case study.
S. K. Kuttal, J. Myers, S. Gurka, D. Magar, D. Piorkowski and R. Bellamy. "Towards Designing Conversational Agents for Pair Programming: Accounting for Creativity Strategies and Conversational Styles," 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 1-11, 2000 [abstract]
Abstract: Established research on pair programming reveals benefits, including increasing communication, creativity, self-efficacy, and promoting gender inclusivity. However, research has reported limitations such as finding a compatible partner, scheduling sessions between partners, and resistance to pairing. Further, pairings can be affected by predispositions to negative stereotypes. These problems can be addressed by replacing one human member of the pair with a conversational agent. To investigate the design space of such a conversational agent, we conducted a controlled remote pair programming study. Our analysis found various creative problem-solving strategies and differences in conversational styles. We further analyzed the transferable strategies from human-human collaboration to human-agent collaboration by conducting a Wizard of Oz study. The findings from the two studies helped us gain insights regarding design of a programmer conversational agent. We make recommendations for researchers and practitioners for designing pair programming conversational agent tools.
M. Muller, I. Lange, D. Wang, D. Piorkowski, J. Tsay, Q. V. Liao, C. Dugan and T. Erickson. "How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation," ACM Proc. Int'l Conf. Human Factors in Computing Systems (CHI), 14 pages, 2019 [abstract]
Abstract: With the rise of big data, there has been an increasing need for practitioners in this space and an increasing opportunity for researchers to understand their workflows and design new tools to improve it. Data science is often described as data-driven, comprising unambiguous data and proceeding through regularized steps of analysis. However, this view focuses more on abstract processes, pipelines, and workflows, and less on how data science workers engage with the data. In this paper, we build on the work of other CSCW and HCI researchers in describing the ways that scientists, scholars, engineers, and others work with their data, through analyses of interviews with 21 data science professionals. We set five approaches to data along a dimension of interventions: Data as given; as captured; as curated; as designed; and as created. Data science workers develop an intuitive sense of their data and processes, and actively shape their data. We propose new ways to apply these interventions analytically, to make sense of the complex activities around data practices.
T. Sandbank, M. Shmueli-Scheuer, D. Konopnicki, J. Herzig, J. Richards and D. Piorkowski. "Detecting Egregious Conversations between Customers and Virtual Agents," North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), 2018 [abstract]
Abstract: Virtual agents are becoming a prominent channel of interaction in customer service. Not all customer interactions are smooth, however, and some can become almost comically bad. In such instances, a human agent might need to step in and salvage the conversation. Detecting bad conversations is important since disappointing customer service may threaten customer loyalty and impact revenue. In this paper, we outline an approach to detecting such egregious conversations, using behavioral cues from the user, patterns in agent responses, and useragent interaction. Using logs of two commercial systems, we show that using these features improves the detection F1-score by around 20% over using textual features alone. In addition, we show that those features are common across two quite different domains and, arguably, universal.
D. Piorkowski, S. Penney, A. Henley, M. Pistoia, M. Burnett, O. Tripp and P. Ferrara. "Foraging Goes Mobile: Foraging While Debugging on Mobile Devices," 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 9-17, 2017 [abstract]
Abstract: Although Information Foraging Theory (IFT) research for desktop environments has provided important insights into numerous information foraging tasks, we have been unable to locate IFT research for mobile environments. Despite the limits of mobile platforms, mobile apps are increasingly serving functions that were once exclusively the territory of desktops—and as the complexity of mobile apps increases, so does the need for foraging. In this paper we investigate, through a theory-based, dual replication study, whether and how foraging results from a desktop IDE generalize to a functionally similar mobile IDE. Our results show ways prior foraging research results from desktop IDEs generalize to mobile IDEs and ways they do not, and point to challenging open research questions for foraging on mobile environments.
S. Srinivasa Ragavan, B. Pandya, D. Piorkowski, C. Hill, S. K. Kuttal, A. Sarma and M. Burnett. "PFIS-V: Modeling Foraging Behavior in the Presence of Variants," ACM Proc. Int'l Conf. Human Factors in Computing Systems (CHI), pp. 6232-6244, 2017 [abstract]
Abstract: Foraging among similar variants of the same artifact is a common activity, but computational models of Information Foraging Theory (IFT) have not been developed to take such variants into account. Without being able to computationally predict people's foraging behavior with variants, our ability to harness the theory in practical ways—such as building and systematically assessing tools for people who forage different variants of an artifact—is limited. Therefore, in this paper, we introduce a new predictive model, PFIS-V, that builds upon PFIS3, the most recent of the PFIS family of modeling IFT in programming situations. Our empirical results show that PFIS-V is up to 25% more accurate than PFIS3 in predicting where a forager will navigate in a variationed information space.
A. Aydin, D. Piorkowski, O. Tripp, P. Ferrara and M. Pistoia. "Visual Configuration of Mobile Privacy Policies," Int'l Conf. on Fundamental Approaches to Software Engineering (FASE), pp. 338-355, 2017 [abstract]
Abstract: Mobile applications often require access to private user information, such as the user or device ID, the location or the contact list. Usage of such data varies across different applications. A notable example is advertising. For contextual advertising, some applications release precise data, such as the user's exact address, while other applications release only the user's country. Another dimension is the user. Some users are more privacy demanding than others. Existing solutions for privacy enforcement are neither app- nor user- sensitive, instead performing general tracking of private data into release points like the Internet. The main contribution of this paper is in refining privacy enforcement by letting the user configure privacy preferences through a visual interface that captures the application's screens enriched with privacy-relevant information. We demonstrate the efficacy of our approach w.r.t. advertising and analytics, which are the main (third-party) consumers of private user information. We have implemented our approach for Android as the VisiDroid system. We demonstrate VisiDroid's efficacy via both quantitative and qualitative experiments involving top-popular Google Play apps. Our experiments include objective metrics, such as the average number of configuration actions per app, as well as a user study to validate the usability of VisiDroid.
D. Piorkowski, A. Z. Henley, T. Nabi, S. D. Fleming, C. Scaffidi and M. Burnett. "Foraging and Navigations, Fundamentally: Developers' Predictions of Value and Cost," ACM Proc. Int'l Symposium on the Foundations of Software Engineering (FSE), pp. 97-108, 2016 [abstract]
Abstract: Empirical studies have revealed that software developers spend 35%–50% of their time navigating through source code during development activities, yet fundamental questions remain: Are these percentages too high, or simply inherent in the nature of software development? Are there factors that somehow determine a lower bound on how effectively developers can navigate a given information space? Answering questions like these requires a theory that captures the core of developers' navigation decisions. Therefore, we use the central proposition of Information Foraging Theory to investigate developers' ability to predict the value and cost of their navigation decisions. Our results showed that over 50% of developers' navigation choices produced less value than they had predicted and nearly 40% cost more than they had predicted. We used those results to guide a literature analysis, to investigate the extent to which these challenges are met by current research efforts, revealing a new area of inquiry with a rich and crosscutting set of research challenges and open problems.
S. Srinivasa Ragavan, S. K. Kuttal, C. Hill, A. Sarma, D. Piorkowski and M. Burnett. "Foraging among an Overabundance of Similar Variants," ACM Proc. Int'l Conf. Human Factors in Computing Systems (CHI), pp. 3509-3521, 2016 [abstract]
Abstract: Foraging among too many variants of the same artifact can be problematic when many of these variants are similar. This situation, which is largely overlooked in the literature, is commonplace in several types of creative tasks, one of which is exploratory programming. In this paper, we investigate how novice programmers forage through similar variants. Based on our results, we propose a refinement to Information Foraging Theory (IFT) to include constructs about variation foraging behavior, and propose refinements to computational models of IFT to better account for foraging among variants.
D. Piorkowski, S. D. Fleming, C. Scaffidi, M. Burnett, I. Kwan, A. Z. Henley, J. Macbeth, C. Hill and A. Horvath. "To Fix or to Learn? How Production Bias Affects Developers' Information Foraging during Debugging," IEEE Int'l Conf. on Software Maintenance and Evolution (ICSME), pp. 11-20, 2015 [abstract]
Abstract: Developers performing maintenance activities must balance their efforts to learn the code vs. their efforts to actually change it. This balancing act is consistent with the "production bias" that, according to Carroll's minimalist learning theory, generally affects software users during everyday tasks. This suggests that developers' focus on efficiency should have marked effects on how they forage for the information they think they need to fix bugs. To investigate how developers balance fixing versus learning during debugging, we conducted the first empirical investigation of the interplay between production bias and information foraging. Our theory-based study involved 11 participants: half tasked with fixing a bug, and half tasked with learning enough to help someone else fix it. Despite the subtlety of difference between their tasks, participants foraged remarkably differently—making foraging decisions from different types of "patches," with different types of information, and succeeding with different foraging tactics.
D. Piorkowski, S. D. Fleming, I. Kwan, M. Burnett, C. Scaffidi, R. Bellamy and J. Jordahl.. "The Whats and Hows of Programmers' Foraging Diets," The Whats and Hows of Programmers' Foraging Diets (CHI), pp. 3063-3072, 2013 [abstract]
Abstract: One of the least studied areas of Information Foraging Theory is diet: the information foragers choose to seek. For example, do foragers choose solely based on cost, or do they stubbornly pursue certain diets regardless of cost? Do their debugging strategies vary with their diets? To investigate "what" and "how" questions like these for the domain of software debugging, we qualitatively analyzed 9 professional developers' foraging goals, goal patterns, and strategies. Participants spent 50% of their time foraging. Of their foraging, 58% fell into distinct dietary patterns—mostly in patterns not previously discussed in the literature. In general, programmers' foraging strategies leaned more heavily toward enrichment than we expected, but different strategies aligned with different goal types. These and our other findings help fill the gap as to what programmers' dietary goals are and how their strategies relate to those goals.
D. Piorkowski, S. D. Fleming, C. Scaffidi, C. Bogart, M. Burnett, B. E. John, R. Bellamy and C. Swart. "Reactive Information Foraging: An Empirical Investigation of Theory-Based Recommender Systems for Programmers," ACM Proc. Int'l Conf. Human Factors in Computing Systems (CHI), pp. 1471-1480, 2012 [abstract]
Abstract: Information Foraging Theory (IFT) has established itself as an important theory to explain how people seek information, but most work has focused more on the theory itself than on how best to apply it. In this paper, we investigate how to apply a reactive variant of IFT (Reactive IFT) to design IFT-based tools, with a special focus on such tools for ill-structured problems. Toward this end, we designed and implemented a variety of recommender algorithms to empirically investigate how to help people with the ill-structured problem of finding where to look for information while debugging source code. We varied the algorithms based on scent type supported (words alone vs. words + code structure), and based on use of foraging momentum to estimate rapidity of foragers' goal changes. Our empirical results showed that (1) using both words and code structure significantly improved the ability of the algorithms to recommend where software developers should look for information; (2) participants used recommendations to discover new places in the code and also as shortcuts to navigate to known places; and (3) low-momentum recommendations were significantly more useful than high-momentum recommendations, suggesting rapid and numerous goal changes in this type of setting. Overall, our contributions include two new recommendation algorithms, empirical evidence about when and why participants found IFT-based recommendations useful, and implications for the design of tools based on Reactive IFT.
D. Piorkowski, S. D. Fleming, C. Scaffidi, L. John, C. Bogart, B. E. John, M. Burnett and R. Bellamy. "Modeling programmer navigation: A head-to-head empirical evaluation of predictive models," 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 109-116, 2011 [abstract]
Abstract: Software developers frequently need to perform code maintenance tasks, but doing so requires time-consuming navigation through code. A variety of tools are aimed at easing this navigation by using models to identify places in the code that a developer might want to visit, and then providing shortcuts so that the developer can quickly navigate to those locations. To date, however, only a few of these models have been compared head-to-head to assess their predictive accuracy. In particular, we do not know which models are most accurate overall, which are accurate only in certain circumstances, and whether combining models could enhance accuracy. Therefore, we have conducted an empirical study to evaluate the accuracy of a broad range of models for predicting many different kinds of code navigations in sample maintenance tasks. Overall, we found that models tended to perform best if they took into account how recently a developer has viewed pieces of the code, and if models took into account the spatial proximity of methods within the code. We also found that the accuracy of single-factor models can be improved by combining factors, using a spreading-activation based approach, to produce multi-factor models. Based on these results, we offer concrete guidance about how these models could be used to provide enhanced software development tools that ease the difficulty of navigating through code.
C. Bogart, M. Burnett, S. Douglass, D. Piorkowski and A. Shinsel. "Does My Model Work? Evaluation Abstractions of Cognitive Modelers," 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 49-56, 2010 [abstract]
Abstract: Are the abstractions that scientific modelers use to build their models in a modeling language the same abstractions they use to evaluate the correctness of their models? The extent to which such differences exist seems likely to correspond to additional effort of modelers in determining whether their models work as intended. In this paper, we therefore investigate the distinction between "programming abstractions" and "evaluation abstractions". As the basis of our investigation, we conducted a case study on cognitive modeling. We report modelers' evaluation abstractions, and the lengths they went to in evaluating their models. From these results, we derive design implications for several categories of persistent, first-class evaluation abstractions in future debugging tools for modelers.