D. Piorkowski, S. Park, A. Y. Wang, D. Wang, M. Muller and F. Portnoy. "How AI Developers Overcome Communication Challenges in a Multidisciplinary Team: A Case Study," ACM Conf. on Computer-Supported Cooperative Work and Social Computing (CSCW), 23 pages, 2021 (to appear, preprint available)

Abstract: The development of AI applications is a multidisciplinary effort, involving multiple roles collaborating with the AI developers, an umbrella term we use to include data scientists and other AI-adjacent roles on the same team. During these collaborations, there is a knowledge mismatch between AI developers, who are skilled in data science, and external stakeholders who are typically not. This difference leads to communication gaps, and the onus falls on AI developers to explain data science concepts to their collaborators. In this paper, we report on a study including analyses of both interviews with AI developers and artifacts they produced for communication. Using the analytic lens of shared mental models, we report on the types of communication gaps that AI developers face, how AI developers communicate across disciplinary and organizational boundaries, and how they simultaneously manage issues regarding trust and expectations.

S. Park, A. Wang, B. Kawas, Q. V. Liao, D. Piorkowski and Marina Danilevsky. "Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models," ACM Conference on Initlligent User Interfacess (IUI), 18 pages, 2021 (to appear, preprint available)

Abstract: Data scientists face a steep learning curve in understanding a new domain for which they want to build machine learning (ML) models. While input from domain experts could offer valuable help, such input is often limited, expensive, and generally not in a form readily consumable by a model development pipeline. In this paper, we propose Ziva, a framework to guide domain experts in sharing essential domain knowledge to data scientists for building NLP models. With Ziva, experts are able to distill and share their domain knowledge using domain concept extractors and five types of label justification over a representative data sample. The design of Ziva is informed by preliminary interviews with data scientists, in order to understand current practices of domain knowledge acquisition process for ML development projects. To assess our design, we run a mix-method case-study to evaluate how Ziva can facilitate interaction of domain experts and data scientists. Our results highlight that (1) domain experts are able to use Ziva to provide rich domain knowledge, while maintaining low mental load and stress levels; and (2) data scientists find Ziva's output helpful for learning essential information about the domain, offering scalability of information, and lowering the burden on domain experts to share knowledge. We conclude this work by experimenting with building NLP models using the Ziva output by our case study.

S. K. Kuttal, J. Myers, S. Gurka, D. Magar, D. Piorkowski and R. Bellamy. "Towards Designing Conversational Agents for Pair Programming: Accounting for Creativity Strategies and Conversational Styles," 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 1-11, 2000

Abstract: Established research on pair programming reveals benefits, including increasing communication, creativity, self-efficacy, and promoting gender inclusivity. However, research has reported limitations such as finding a compatible partner, scheduling sessions between partners, and resistance to pairing. Further, pairings can be affected by predispositions to negative stereotypes. These problems can be addressed by replacing one human member of the pair with a conversational agent. To investigate the design space of such a conversational agent, we conducted a controlled remote pair programming study. Our analysis found various creative problem-solving strategies and differences in conversational styles. We further analyzed the transferable strategies from human-human collaboration to human-agent collaboration by conducting a Wizard of Oz study. The findings from the two studies helped us gain insights regarding design of a programmer conversational agent. We make recommendations for researchers and practitioners for designing pair programming conversational agent tools.

M. Muller, I. Lange, D. Wang, D. Piorkowski, J. Tsay, Q. V. Liao, C. Dugan and T. Erickson. "How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation," ACM Proc. Int'l Conf. Human Factors in Computing Systems (CHI), 14 pages, 2019

Abstract: With the rise of big data, there has been an increasing need for practitioners in this space and an increasing opportunity for researchers to understand their workflows and design new tools to improve it. Data science is often described as data-driven, comprising unambiguous data and proceeding through regularized steps of analysis. However, this view focuses more on abstract processes, pipelines, and workflows, and less on how data science workers engage with the data. In this paper, we build on the work of other CSCW and HCI researchers in describing the ways that scientists, scholars, engineers, and others work with their data, through analyses of interviews with 21 data science professionals. We set five approaches to data along a dimension of interventions: Data as given; as captured; as curated; as designed; and as created. Data science workers develop an intuitive sense of their data and processes, and actively shape their data. We propose new ways to apply these interventions analytically, to make sense of the complex activities around data practices.

T. Sandbank, M. Shmueli-Scheuer, D. Konopnicki, J. Herzig, J. Richards and D. Piorkowski. "Detecting Egregious Conversations between Customers and Virtual Agents," North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), 2018

Abstract: Virtual agents are becoming a prominent channel of interaction in customer service. Not all customer interactions are smooth, however, and some can become almost comically bad. In such instances, a human agent might need to step in and salvage the conversation. Detecting bad conversations is important since disappointing customer service may threaten customer loyalty and impact revenue. In this paper, we outline an approach to detecting such egregious conversations, using behavioral cues from the user, patterns in agent responses, and useragent interaction. Using logs of two commercial systems, we show that using these features improves the detection F1-score by around 20% over using textual features alone. In addition, we show that those features are common across two quite different domains and, arguably, universal.

D. Piorkowski, S. Penney, A. Henley, M. Pistoia, M. Burnett, O. Tripp and P. Ferrara. "Foraging Goes Mobile: Foraging While Debugging on Mobile Devices," 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 9-17, 2017
Honorable Mention

Abstract: Although Information Foraging Theory (IFT) research for desktop environments has provided important insights into numerous information foraging tasks, we have been unable to locate IFT research for mobile environments. Despite the limits of mobile platforms, mobile apps are increasingly serving functions that were once exclusively the territory of desktops—and as the complexity of mobile apps increases, so does the need for foraging. In this paper we investigate, through a theory-based, dual replication study, whether and how foraging results from a desktop IDE generalize to a functionally similar mobile IDE. Our results show ways prior foraging research results from desktop IDEs generalize to mobile IDEs and ways they do not, and point to challenging open research questions for foraging on mobile environments.

S. Srinivasa Ragavan, B. Pandya, D. Piorkowski, C. Hill, S. K. Kuttal, A. Sarma and M. Burnett. "PFIS-V: Modeling Foraging Behavior in the Presence of Variants," ACM Proc. Int'l Conf. Human Factors in Computing Systems (CHI), pp. 6232-6244, 2017

Abstract: Foraging among similar variants of the same artifact is a common activity, but computational models of Information Foraging Theory (IFT) have not been developed to take such variants into account. Without being able to computationally predict people's foraging behavior with variants, our ability to harness the theory in practical ways—such as building and systematically assessing tools for people who forage different variants of an artifact—is limited. Therefore, in this paper, we introduce a new predictive model, PFIS-V, that builds upon PFIS3, the most recent of the PFIS family of modeling IFT in programming situations. Our empirical results show that PFIS-V is up to 25% more accurate than PFIS3 in predicting where a forager will navigate in a variationed information space.

A. Aydin, D. Piorkowski, O. Tripp, P. Ferrara and M. Pistoia. "Visual Configuration of Mobile Privacy Policies," Int'l Conf. on Fundamental Approaches to Software Engineering (FASE), pp. 338-355, 2017

Abstract: Mobile applications often require access to private user information, such as the user or device ID, the location or the contact list. Usage of such data varies across different applications. A notable example is advertising. For contextual advertising, some applications release precise data, such as the user's exact address, while other applications release only the user's country. Another dimension is the user. Some users are more privacy demanding than others. Existing solutions for privacy enforcement are neither app- nor user- sensitive, instead performing general tracking of private data into release points like the Internet. The main contribution of this paper is in refining privacy enforcement by letting the user configure privacy preferences through a visual interface that captures the application's screens enriched with privacy-relevant information. We demonstrate the efficacy of our approach w.r.t. advertising and analytics, which are the main (third-party) consumers of private user information. We have implemented our approach for Android as the VisiDroid system. We demonstrate VisiDroid's efficacy via both quantitative and qualitative experiments involving top-popular Google Play apps. Our experiments include objective metrics, such as the average number of configuration actions per app, as well as a user study to validate the usability of VisiDroid.

D. Piorkowski, A. Z. Henley, T. Nabi, S. D. Fleming, C. Scaffidi and M. Burnett. "Foraging and Navigations, Fundamentally: Developers' Predictions of Value and Cost," ACM Proc. Int'l Symposium on the Foundations of Software Engineering (FSE), pp. 97-108, 2016
Distinguished Paper

Abstract: Empirical studies have revealed that software developers spend 35%–50% of their time navigating through source code during development activities, yet fundamental questions remain: Are these percentages too high, or simply inherent in the nature of software development? Are there factors that somehow determine a lower bound on how effectively developers can navigate a given information space? Answering questions like these requires a theory that captures the core of developers' navigation decisions. Therefore, we use the central proposition of Information Foraging Theory to investigate developers' ability to predict the value and cost of their navigation decisions. Our results showed that over 50% of developers' navigation choices produced less value than they had predicted and nearly 40% cost more than they had predicted. We used those results to guide a literature analysis, to investigate the extent to which these challenges are met by current research efforts, revealing a new area of inquiry with a rich and crosscutting set of research challenges and open problems.

S. Srinivasa Ragavan, S. K. Kuttal, C. Hill, A. Sarma, D. Piorkowski and M. Burnett. "Foraging among an Overabundance of Similar Variants," ACM Proc. Int'l Conf. Human Factors in Computing Systems (CHI), pp. 3509-3521, 2016
Best Paper

Abstract: Foraging among too many variants of the same artifact can be problematic when many of these variants are similar. This situation, which is largely overlooked in the literature, is commonplace in several types of creative tasks, one of which is exploratory programming. In this paper, we investigate how novice programmers forage through similar variants. Based on our results, we propose a refinement to Information Foraging Theory (IFT) to include constructs about variation foraging behavior, and propose refinements to computational models of IFT to better account for foraging among variants.

D. Piorkowski, S. D. Fleming, C. Scaffidi, M. Burnett, I. Kwan, A. Z. Henley, J. Macbeth, C. Hill and A. Horvath. "To Fix or to Learn? How Production Bias Affects Developers' Information Foraging during Debugging," IEEE Int'l Conf. on Software Maintenance and Evolution (ICSME), pp. 11-20, 2015

Abstract: Developers performing maintenance activities must balance their efforts to learn the code vs. their efforts to actually change it. This balancing act is consistent with the "production bias" that, according to Carroll's minimalist learning theory, generally affects software users during everyday tasks. This suggests that developers' focus on efficiency should have marked effects on how they forage for the information they think they need to fix bugs. To investigate how developers balance fixing versus learning during debugging, we conducted the first empirical investigation of the interplay between production bias and information foraging. Our theory-based study involved 11 participants: half tasked with fixing a bug, and half tasked with learning enough to help someone else fix it. Despite the subtlety of difference between their tasks, participants foraged remarkably differently—making foraging decisions from different types of "patches," with different types of information, and succeeding with different foraging tactics.

D. Piorkowski, S. D. Fleming, I. Kwan, M. Burnett, C. Scaffidi, R. Bellamy and J. Jordahl.. "The Whats and Hows of Programmers' Foraging Diets," The Whats and Hows of Programmers' Foraging Diets (CHI), pp. 3063-3072, 2013

Abstract: One of the least studied areas of Information Foraging Theory is diet: the information foragers choose to seek. For example, do foragers choose solely based on cost, or do they stubbornly pursue certain diets regardless of cost? Do their debugging strategies vary with their diets? To investigate "what" and "how" questions like these for the domain of software debugging, we qualitatively analyzed 9 professional developers' foraging goals, goal patterns, and strategies. Participants spent 50% of their time foraging. Of their foraging, 58% fell into distinct dietary patterns—mostly in patterns not previously discussed in the literature. In general, programmers' foraging strategies leaned more heavily toward enrichment than we expected, but different strategies aligned with different goal types. These and our other findings help fill the gap as to what programmers' dietary goals are and how their strategies relate to those goals.

D. Piorkowski, S. D. Fleming, C. Scaffidi, C. Bogart, M. Burnett, B. E. John, R. Bellamy and C. Swart. "Reactive Information Foraging: An Empirical Investigation of Theory-Based Recommender Systems for Programmers," ACM Proc. Int'l Conf. Human Factors in Computing Systems (CHI), pp. 1471-1480, 2012

Abstract: Information Foraging Theory (IFT) has established itself as an important theory to explain how people seek information, but most work has focused more on the theory itself than on how best to apply it. In this paper, we investigate how to apply a reactive variant of IFT (Reactive IFT) to design IFT-based tools, with a special focus on such tools for ill-structured problems. Toward this end, we designed and implemented a variety of recommender algorithms to empirically investigate how to help people with the ill-structured problem of finding where to look for information while debugging source code. We varied the algorithms based on scent type supported (words alone vs. words + code structure), and based on use of foraging momentum to estimate rapidity of foragers' goal changes. Our empirical results showed that (1) using both words and code structure significantly improved the ability of the algorithms to recommend where software developers should look for information; (2) participants used recommendations to discover new places in the code and also as shortcuts to navigate to known places; and (3) low-momentum recommendations were significantly more useful than high-momentum recommendations, suggesting rapid and numerous goal changes in this type of setting. Overall, our contributions include two new recommendation algorithms, empirical evidence about when and why participants found IFT-based recommendations useful, and implications for the design of tools based on Reactive IFT.

D. Piorkowski, S. D. Fleming, C. Scaffidi, L. John, C. Bogart, B. E. John, M. Burnett and R. Bellamy. "Modeling programmer navigation: A head-to-head empirical evaluation of predictive models," 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 109-116, 2011

Abstract: Software developers frequently need to perform code maintenance tasks, but doing so requires time-consuming navigation through code. A variety of tools are aimed at easing this navigation by using models to identify places in the code that a developer might want to visit, and then providing shortcuts so that the developer can quickly navigate to those locations. To date, however, only a few of these models have been compared head-to-head to assess their predictive accuracy. In particular, we do not know which models are most accurate overall, which are accurate only in certain circumstances, and whether combining models could enhance accuracy. Therefore, we have conducted an empirical study to evaluate the accuracy of a broad range of models for predicting many different kinds of code navigations in sample maintenance tasks. Overall, we found that models tended to perform best if they took into account how recently a developer has viewed pieces of the code, and if models took into account the spatial proximity of methods within the code. We also found that the accuracy of single-factor models can be improved by combining factors, using a spreading-activation based approach, to produce multi-factor models. Based on these results, we offer concrete guidance about how these models could be used to provide enhanced software development tools that ease the difficulty of navigating through code.

C. Bogart, M. Burnett, S. Douglass, D. Piorkowski and A. Shinsel. "Does My Model Work? Evaluation Abstractions of Cognitive Modelers," 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 49-56, 2010

Abstract: Are the abstractions that scientific modelers use to build their models in a modeling language the same abstractions they use to evaluate the correctness of their models? The extent to which such differences exist seems likely to correspond to additional effort of modelers in determining whether their models work as intended. In this paper, we therefore investigate the distinction between "programming abstractions" and "evaluation abstractions". As the basis of our investigation, we conducted a case study on cognitive modeling. We report modelers' evaluation abstractions, and the lengths they went to in evaluating their models. From these results, we derive design implications for several categories of persistent, first-class evaluation abstractions in future debugging tools for modelers.

M. Arnold, R. K. E. Bellamy, M. Hind, S. Houde, S. Mehta, A. Mojsilovic, R. Nair, K. Natesan Ramamurthy, D. Reimer, A. Olteanu, D. Piorkowski, J. Tsay and K. R Varshney.. "FactSheets: Increasing Trust in AI Services through Supplier's Declarations of Conformity," IBM Journal of Research and Development, pp. 6:1-6:13, 2019

Abstract: Accuracy is an important concern for suppliers of artificial intelligence (AI) services, but considerations beyond accuracy, such as safety (which includes fairness and explainability), security, and provenance, are also critical elements to engender consumers’ trust in a service. Many industries use transparent, standardized, but often not legally required documents called supplier's declarations of conformity (SDoCs) to describe the lineage of a product along with the safety and performance testing it has undergone. SDoCs may be considered multidimensional fact sheets that capture and quantify various aspects of the product and its development to make it worthy of consumers’ trust. In this article, inspired by this practice, we propose FactSheets to help increase trust in AI services. We envision such documents to contain purpose, performance, safety, security, and provenance information to be completed by AI service providers for examination by consumers. We suggest a comprehensive set of declaration items tailored to AI in the Appendix of this article.

S. Srinivasa Ragavan, M. Codoban, D. Piorkowski, D. Dig and M. Burnett. "Version Control Systems: An Information Foraging Perspective," IEEE Transactions on Software Engineering (TSE), 11 pages, 2019

Abstract: Version Control Systems (VCS) are an important source of information for developers. This calls for a principled understanding of developers' information seeking in VCS-both for improving existing tools and for understanding requirements for new tools. Our prior work investigated empirically how and why developers seek information in VCS: in this paper, we complement and enrich our prior findings by reanalyzing the data via a theory's lens. Using the lens of Information Foraging Theory (IFT), we present new insights not revealed by the prior empirical work. First, while looking for specific information, participants' foraging behaviors were consistent with other foraging situations in SE; therefore, prior research on IFT-based SE tool design can be leveraged for VCS. Second, in change awareness foraging, participants consumed similar diets, but in subtly different ways than in other situations; this calls for further investigations into change awareness foraging. Third, while committing changes, participants attempted to enable future foragers, but the competing needs of different foraging situations led to tensions that participants failed to balance: this opens up a new avenue for research at the intersection of IFT and SE, namely, creating forageable information. Finally, the results of using an IFT lens on these data provides some evidence as to IFT's scoping and utility for the version control domain.

C. Scaffidi, S. D. Fleming, D. Piorkowski, M. Burnett, R. Bellamy, J. Lawrance and I. Kwan. "Information Foraging Theory Perspective on Tools for Debugging, Refactoring, and Reuse Tasks," ACM Transactions on Software Engineering and Methodology (TOSEM), 41 pages, 2013

Abstract: Theories of human behavior are an important but largely untapped resource for software engineering research. They facilitate understanding of human developers’ needs and activities, and thus can serve as a valuable resource to researchers designing software engineering tools. Furthermore, theories abstract beyond specific methods and tools to fundamental principles that can be applied to new situations. Toward filling this gap, we investigate the applicability and utility of Information Foraging Theory (IFT) for understanding information-intensive software engineering tasks, drawing upon literature in three areas: debugging, refactoring, and reuse. In particular, we focus on software engineering tools that aim to support information-intensive activities, that is, activities in which developers spend time seeking information. Regarding applicability, we consider whether and how the mathematical equations within IFT can be used to explain why certain existing tools have proven empirically successful at helping software engineers. Regarding utility, we applied an IFT perspective to identify recurring design patterns in these successful tools, and consider what opportunities for future research are revealed by our IFT perspective.

M. Hind, S. Houde, J. Martino, A. Mojsilovic, D. Piorkowski, J. Richards and K. R. Varshney. "Experiences with improving the transparency of AI models and services," Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (CHI), 8 pages, 2020

Abstract: AI models and services are used in a growing number of high-stakes areas, resulting in a need for increased transparency. Consistent with this, several proposals for higher quality and more consistent documentation of AI data, models, and systems have emerged. Little is known, however, about the needs of those who would produce or consume these new forms of documentation. Through semi-structured developer interviews, and two document-creation exercises, we have assembled a clearer picture of these needs and the various challenges faced in creating accurate and useful AI documentation. Based on the observations from this work, supplemented by feedback received during multiple design explorations and stakeholder conversations, we make recommendations for easing the collection and flexible presentation of AI facts to promote transparency.

T. Nabi, K. M. D. Sweeney, S. Lichlyter, D. Piorkowski, C. Scaffidi, M. Burnett and S. D. Fleming. "Putting Information Foraging Theory to Work: Community-based Design Patterns for Programming Tools," 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 129-133, 2016

Abstract: The design of programming tools is slow and costly. To ease this process, we developed a design pattern catalog aimed at providing guidance for tool designers. This catalog is grounded in Information Foraging Theory (IFT), which empirical studies have shown to be useful for understanding how developers look for information during development tasks. New design patterns, authored by members of the research community for the catalog, concretely explain how to apply IFT in tool design. In our evaluation, qualitative analyses revealed the community-written design patterns compared well in quality to patterns that we had ourselves published in a smaller, peer-reviewed catalog.

M. Muller, A. Wang, S. Ross, J. Weisz, M. Agarwal, K. Talamadupula, S. Houde, F. Martinez, J. Richards, J. Drozdal, X. Liu, D. Piorkowski and D. Wang. "How Data Scientists Improve Generated Code Documentation in Jupyter Notebooks," IUI 2021 Workshop on Human-AI Co-Creation with Generative Models (IUI), 13 pages, 2021 (to appear)

Abstract: Generative AI models are capable of creating high-fidelity outputs, sometimes indistinguishable from what could be produced by human effort. However, some domains possess an objective bar of quality, and the probabilistic nature of generative models suggests that there may be imperfections or flaws in their output. In software engineering, for example, code produced by a generative model may not compile, or it may contain bugs or logical errors. Various models of human-AI interaction, such as mixed-initiative user interfaces, suggest that human effort ought to be applied to a generative model's outputs in order to improve its quality. We report results from a controlled experiment in which data scientists used multiple models -- including a GNN-based generative model -- to generate and subsequently edit documentation for data science code within Jupyter notebooks. In analyzing their edit-patterns, we discovered various ways that humans made improvements to the generated documentation, and speculate that such edit data could be used to train generative models to not only identify which parts of their output might require human attention, but also how those parts could be improved.

S. Houde, V. Liao, J. Martino, M. Muller, D. Piorkowski, J. Richards, J. Weisz and Y. Zhang. "Business (mis)Use Cases of Generative AI," IUI 2020 Workshop on Human-AI Co-Creation with Generative Models (IUI), 6 pages, 2020

Abstract: Generative AI is a class of machine learning technology that learns to generate new data from training data. While deep fakes and media-and art-related generative AI breakthroughs have recently caught people's attention and imagination, the overall area is in its infancy for business use. Further, little is known about generative AI's potential for malicious misuse at large scale. Using co-creation design fictions with AI engineers, we explore the plausibility and severity of business misuse cases.

I. Kwan, S. D. Fleming and D. Piorkowski. "Information Foraging Theory for Collaborative Software Development," Proceedings of the CSCW 2012 Workshop on The Future of Collaborative Software Development (Future CSD), 3 pages, 2012

Abstract: Information foraging theory describes how people gather information based on a cost-benefit model. This theory has been successfully applied to the web domain and to software engineering tools. However, little work has been done on how information foraging theory can be applied to information-seeking behavior in a collaborative software engineering setting. This paper discusses how the theory might apply to information-seeking within collaborative softwaredevelopment teams, and how constructs of the theory might help aid in the design of tools and processes.

M. Hirzel, H. L. Ossher, D. J. Piorkowski and P. Tarr. "Conversational optimization of cognitive models," US Patent No. 10,810,994. Filed on 2018-07-19. Granted on 2020-10-20.

E. Duesterwald, A. Z. Henley, D. J. Piorkowski and J. T. Richards. "Framework of proactive and/or reactive strategies for improving labeling consistency and efficiency," (pending). Filed on 2017-12-28.

G. A. Baudart, J. T. Dolby, E. Duesterwald and D. J. Piorkowski. "Cognitive virtual detector," US Patent No. 10,574,598. Filed on 2017-10-18. Granted on 2020-02-25.

E. Duesterwald, Y. Chen, M. Desmond, H. L. Ossher and D. J. Piorkowski. "Filter for harmful training samples in online learning systems," US Patent No. 10,977,562. Filed on 2017-08-07. Granted on 2021-04-13.