Many-shot Jailbreaking (sanity.io)

Instructors as Innovators: A future-focused approach to new AI learning opportunities, with prompts

University of Pennsylvania - Wharton School

22.04.24

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4802463

This paper explores how instructors can leverage generative AI to create personalized learning experiences for students that transform teaching and learning. We present a range of AI-based exercises that enable novel forms of practice and application including simulations, mentoring, coaching, and co-creation. For each type of exercise, we provide prompts that instructors can customize, along with guidance on classroom implementation, assessment, and risks to consider. We also provide blueprints, prompts that help instructors create their own original prompts. Instructors can leverage their content and pedagogical expertise to design these experiences, putting them in the role of builders and innovators. We argue that this instructor-driven approach has the potential to democratize the development of educational technology by enabling individual instructors to create AI exercises and tools tailored to their students' needs. While the exercises in this paper are a starting point, not a definitive solutions, they demonstrate AI's potential to expand what is possible in teaching and learning.

Skill but not Effort Drive GPT Overperformance over Humans in Cognitive Reframing of Negative Scenarios

Harvard Business School, Harvard University,

19.04.24

https://osf.io/preprints/psyarxiv/fzvd8

Recent advancements in large language models (LLMs), such as GPT, have led to their implementation in tasks involving emotional support. However, LLM performance has not been compared to humans in both quality and the type of content produced. We examined this question by focusing on the skill of reframing negative situations to reduce negative emotions, also known as cognitive reappraisal. We trained both humans (N= 601) and GPT-4 to reframe negative vignettes (Nreappraisals = 4195) and compared their performance using human raters (N = 1744). GPT-4 outperformed humans on three of the four examined metrics. We investigated whether the gap was driven by effort or skill by incentivizing participants to produce better reappraisals, which led to increased time spent on reappraisals but did not decrease the gap between humans and GPT-4. Content analysis suggested that high-quality reappraisals produced by GPT-4 were associated with being more semantically similar to the emotional scenarios, which pointed to the fact that GPT-4's success is predicated on tuning into the specific scenario. Results were in the opposite direction for humans, whose reappraisals were rated higher when being more semantically different from the emotional scenario, suggesting that human success is predicated on generalizing away from the specific situations. These results help us understand the nature of emotional support by LLMs and how it compares to humans.

Automated Social Science: Language Models as Scientist and Subjects

MIT, Harvard University