Data Science (MDSC)
MDSC 20009 Introduction to Data Science (3 Credit Hours)
"Introduction to Data Science" is an introductory course that will provide an overview of data science from both a computer science perspective and a social science perspective; This course will orient students to the field, to key concepts, to the types of questions addressed, to the technical aspects of data science and to the process of making sense of data.
MDSC 20110 Archaeology of Hacking: Everything You Wanted to Know About Hacking But Were Afraid to Ask (3 Credit Hours)
"Hacking" is one of the most pressing topics of technological and societal interest. Yet, it is one of the most misunderstood and mischaracterized practices in the public sphere, given its ethical and technical complexities. In this course we
will combine anthropological and computer science methods to explore the digital tools, practices, and sociocultural histories of hacking with a focus on their context of occurrence from the late 1960s to the present. Our goal is to help
students think anthropologically about computing as well as technically about the digital mediations that we depend on in our lives.
Satisfies the following University Core Requirements: WKIN - Core Integration
MDSC 20309 Data in a Changing Planet: Environmental Data and Sustainability (3 Credit Hours)
This course presents an introduction to the socio-technical study of knowledge infrastructures in the context of a rising quest for environmental sustainability. It examines the critical role of data in supporting scientific research, environmental action, and sustainability efforts. The goal is to critically discuss the central place of data in a changing world where the proliferation of new digital technologies supports new capabilities for sensing, sharing, processing, and visualizing rapidly accelerating environmental change. This course will bring forward the interconnected technical, cultural, historical, political, and social efforts that make environmental data possible. Applying socio-technical lenses to environmental data, we will go through the different stages of an environmental data workflow, all the way from data collection to visualization and reporting. The course will pay special attention to the local and global entities, past and present, that environmental data supports. We will focus on the implications of digital technologies for participatory and citizen sciences, open data, and data governance in the environmental space. Leaning on these critical tools, we will revise ongoing environmental struggles and data-fueled sustainability efforts to assess the implications of data in ongoing and future attempts to restore and reinvent the integrity of our planet and its life-supporting systems. The course will resort to practical examples using environmental datasets and a network of socio-environmental practitioners who will present selected topics throughout the semester.
MDSC 20647 Data and Artificial Intelligence Ethics (3 Credit Hours)
In the last decade, the Big Data revolution and developments in Artificial Intelligence (AI) have both created promises and raised several ethical issues. Computational emerging technologies have fostered the achievement of apparent benefits, while at the same they seem to exacerbate social inequalities and threaten even our own existence as a species. In this course, we will discuss those ethical and societal issues related to the development of AI and Big Data that have direct andconcrete consequences on the way we perceive ourselves as persons, as members of society, andthe way we conceive our place as a species on this planet. These issues will be analyzed in lightof major ethical theories, but a special emphasis will be placed on virtue ethics. Recent works invirtue ethics are well positioned to make sense of the importance of our place as human beings onthis planet, but at the same time they can account for the indispensable roles that machines play inour environment.The course is divided in three main parts. In the first part, I will introduce the main ethicalframeworks, and in particular virtue ethics. In the second part, we will discuss AI. Societal andethical issues raised by AI include the threats posed to the existence of our species; whether weshould trust AI or we should find a way to build artificial agents with moral characteristics; whetherAI will do most of our jobs in the future and if this scenario is desirable. In the third part, we wil lfocus on selected issues concerning the Big Data revolution, such as how the autonomy of very complex algorithms can shape our lives in opaque ways and whether transparency is desirable; ifthe design of algorithms may hide bias leading to social inequalities; how algorithms are changing the way healthcare is provided.Upon successful completion of this course, you will be able to:1. define and sketch focal points of the virtue ethics and other relevant ethical theories.2 identify moral theories in arguments provided in support or in opposition to the use of certain AI-related and Big Data technologies3. compare different arguments and highlight strengths and weaknesses
Satisfies the following University Core Requirements: WKSP - Core 2nd Philosophy
MDSC 20919 Algorithms, Data, and Society (3 Credit Hours)
Algorithms and data increasingly influence our behavior, steer resources, and inform institutional decisions that affect our everyday lives. This course will examine the social forces that shape what information gets recorded in databases and how algorithms are constructed and used. It will also introduce various approaches for assessing how algorithms and big data impact the social world. Along the way, we'll tackle important questions raised by these technological developments: What opportunities and challenges emerge when machine learning is applied to data about people? How should we evaluate whether algorithms are better or worse than the systems they replace? How might algorithms shape our agency, relationships, and access to opportunity?
MDSC 21700 Intro to Text Analytics with Python (3 Credit Hours)
Explore the power of understanding textual data with the aid of the Python programming language. Students will acquire essential skills to analyze and visualize text data, as well as develop a new breadth of knowledge in an increasingly important domain that provides new, innovative solutions to traditional humanities problems. This course will blend traditional methods of studying texts alongside the most popular coding language used in the digital humanities in a non-intimidating and inclusive environment. No prior programming experience required. Coding activities will be heavily based on real humanities data sets for immediate, immersive, and practical skill building.
MDSC 30003 Baseball in America (3 Credit Hours)
Baseball is one of the most enduringly popular and significant cultural activities in the United States. Since the late 19th century, baseball has occupied an important place for those wishing to define and understand "America." Who has been allowed to play on what terms? How have events from baseball's past been remembered and re-imagined? What is considered scandalous and why (and who decides)? How has success in baseball been defined and redefined? Centering baseball as an industry and a cultural practice, this course will cover topics that include the political, economic, and social development of professional baseball in the United States; the rise of organized baseball industry and Major League Baseball; and globalization in professional baseball. Readings for this course will include chapters from texts that include Rob Rucks's How the Major Leagues Colonized the Black and Latin Game (2011), Adrian Burgos's Playing America's Game: Baseball, Latinos, and the Color Line (2007), Daniel Gilbert's Expanding the Strike Zone: Baseball in the Age of Free Agency (2013), Robert Elias's How Baseball Sold U.S. Foreign Policy and Promoted the American Way Abroad (2010), and Michael Butterworth's Baseball and Rhetorics of Purity: The National Pastime and American Identity During the War on Terror (2010). Coursework may include response papers, primary source analysis, and a final project.
MDSC 30005 Simulating Politics and Global Affairs (3 Credit Hours)
Politics, markets, and the environment are all spheres of development that are fundamentally shaped by the action and interaction of many individuals over time. For example, the Arab Spring protests, the shortage of medicines in Caracas, and the rising water temperatures of the Baltic Sea are all system-level outcomes arising from the individual actions of thousands or even billions of people. In these spheres, leadership is often weak or non-existent. Scientists call these "complex systems." Complexity is difficult to study in the real world. Instead, scientists often approach these phenomenon using computer simulations (sometimes called agent-based models, social network models, and computational models). The goal is to build computer models of development that link the actions and interactions of individuals to the system-level outcomes. This class will use the perspective, literature, and tools of complexity science to approach core questions in the field of development.
MDSC 30020 Statistics and Its Discontents (3 Credit Hours)
Statistics is one of the most important tools for conducting and communicating the results of virtually all areas of disciplinary inquiry today. From scientific research, public policy, and business management, the history of statistics is the history of how one area of mathematics has come to be seen as providing the common language for making arguments, correctly reasoning, and objectively describing the world. Yet, at the same time, statistics has often come under criticism for its misrepresentation of reality, its ability to easily spread false or misleading information, and its tendency to marginalize or objectify vulnerable populations. From disputes over climate change, election results, and subatomic particle detection, the more widespread statistics has become in different domains of inquiry today the more it has found itself embroil within controversies. This course will introduce students to some of the many controversies that have emerged over the course of statistics’ development throughout the nineteenth and twentieth century, offering a broad overview of the history of statistics in the West over the past 200 years. A central theme for this subject will be the complex dynamic between claims that statistics represent facts about the world while at the same time being the product of competing social, political, racial, and cultural interests. No prior knowledge of statistics is required; however, students will be encouraged to reflect on the ways that the field of their major or chosen profession (physics, biology, law, or economics) has been shaped by statistics. By the end of this course students will be able to better appreciate the different criticisms of statistics within their social and political contexts as well as to articulate how statistics has shaped modern conceptions of objectivity and standards of reasoning in different fields.
Satisfies the following University Core Requirements: WKHI - Core History, WRIT - Writing Intensive
MDSC 30056 Digital Empires: Social Networks, Geographical Motility, and Criminals of Early Chinese Empires (3 Credit Hours)
This course will provide advanced undergraduates and graduate students with a critical introduction to digital humanities for the study of early China, the fountainhead of Chinese Civilization. Collaborating with the Center of Digital Scholarship, this course will focus on relational data with structured information on historical figures, especially high officials, of early Chinese empires. Throughout the semester, we will read academic articles, mine data from primary sources, and employ Gephi and ArcGIS to visualize data. Those constructed data will cover three major themes: how geographical mobility contributed to the solidarity of an newly unified empire over diversified regions, how social networks served as the hidden social structure channeling the flow of power and talents, and how criminal records and excavated legal statutes shed light on the unique understanding of law and its relationship with the state power in Chinese history.
Satisfies the following University Core Requirements: WKHI - Core History
MDSC 30100 Open Government Data (3 Credit Hours)
Open government data—simply put, government-related data freely made available to the public—is on the rise. Our federal, state, and local governments are developing and implementing open data policies and infrastructure in efforts to foster transparency, economic development, and wider civic engagement and participation. We will investigate the technical, legal, and ethical implications of open data (i.e., using open content to train harmful artificial intelligence technologies), acknowledging that personal privacy and civic society are closely intertwined. Class meetings are split between reading discussions and engagement with data science tools and data collection/harvesting methods. Students will inspect the major laws and policies surrounding open government while also examining the social and technological challenges and advancements that shape the future of open data—for example, grassroots data intermediaries are obtaining and "translating" open government data for a public audience. In the spirit of open scholarship, students will develop their own "open data projects" by incorporating open-source tools. No prior knowledge of computer science or data science tools (i.e., R, Python, etc.) is required.
MDSC 30104 Data Feminism (3 Credit Hours)
Feminism isn't only about women, nor is feminism only for women. Feminism is about power - about who has it and who doesn't. And in today's world, data is power. Data can be used to create communities, advance research, and expose injustice. But data can also be used to discriminate, marginalize, and surveil. This course will draw intersectional feminist theory and activism to identify models for challenging existing power differentials in data science, with the aim of using data science methods and tools to work towards justice. Class meetings will be split between discussions of theoretical readings and explorations of data science tools and methods (such as Tableau, RStudio, and Python). Those readings may include chapters from texts that include Catherine D'Ignazio and Lauren Klein's Data Feminism (2020), Virginia Eubanks's Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (2018), Ruha Benjamin's Race After Technology: Abolitionist Tools for the New Jim Code (2019), and Sasha Costanza-Chock's Design Justice: Community-Led Practices to Build the Worlds We Need (2020). This course will also examine the data advocacy and activism work undertaken by groups like Our Data Bodies, Data for Black Lives, the Anti-Eviction Mapping Project, and Chicago-based Citizens Police Data Project. Over the course of the semester, students will develop original research projects that use data to intervene in issues of inequality and injustice.
This course is not about gaining mastery of particular data science tools or methods, therefore familiarity with statistical analysis or data science tools (R, RStudio, Python, etc.) is NOT a prerequisite for this course.
MDSC 30109 R for Data Science (3 Credit Hours)
This class aims to equip students with basic knowledge of R in data manipulation, data generation, data visualization and data analysis with a focus on data science. The first part of the class will introduce the very basics of R including the types of data such as vectors, matrices, and data frames as well as tibbles for refined data frames and bigmatrix for big data. The second part of the class will introduce data manipulation and preprocessing methods such as data transformation, subsetting, and combination. The third part will deal with specific types of data such as strings, texts, dates and times, images, audios, and videos. The fourth part will teach ggplot2 and related packages for data visualization. The last part of the class will illustrate how to conduct data analysis using the above techniques through case studies such as basket analysis, network analysis, and log analysis. The class does not require previous knowledge of R
MDSC 30114 Auditing AI: An Introduction (3 Credit Hours)
As artificial intelligence (AI) grows increasingly pervasive in society, it is essential that we develop an understanding of how AI systems work. A vital part of this understanding is a careful consideration of various risks (e.g., the presence of bias, a lack of transparency, regulatory compliance) when AI systems are designed and deployed in real-world settings. To understand and address these concerns, this course introduces students to the fundamentals of AI auditing — the practice of evaluating and improving the ethics of AI systems. Through a combination of interactive discussions and semi-technical lab sessions, students will develop an auditing “toolkit”. This toolkit includes both theoretical and technical concepts, especially relevant for the increasingly interdisciplinary teams of the modern workforce. Students will work on group case assignments as “audit committees” that reflect the needs of a variety of stakeholders (e.g., developers, managers, investors, users). Groups will identify and discuss potential concerns or risks associated with AI systems as well as develop recommendations to address them. Overall, the course aims to provide an interdisciplinary and hands-on introduction to AI auditing, allowing students to gain insights into the opportunities and challenges associated with the design and deployment of AI systems that minimize societal risk and increase their effectiveness.
MDSC 30125 Race and Technologies of Surveillance (3 Credit Hours)
The United States has a long history of using its most cutting-edge science and technology to discriminate, marginalize, oppress, and surveil. The poorhouse and scientific charity of an earlier era have been replaced by digital tracking and automated decision-making systems like facial recognition and risk prediction algorithms. This course focuses on how automated systems are tasked with making life-and-death choices: which neighborhoods get policed, which families get food, who has housing, and who remains homeless. This course will examine black box tools used in K-12 education, social services, and the criminal justice system to better understand how these technologies reinforce and worsen existing structural inequalities and systems of oppression. Class meetings will be split between discussions of conceptual readings and applied work with technology systems. Readings for this course will draw on texts that include Safiya Noble's Algorithms of Oppression: How Search Engines Reinforce Racism (2018), Virginia Eubanks's Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (2018), Catherine D'Ignazio and Lauren Klein's Data Feminism (2020), and Meredith Broussard's Artificial Unintelligence: How Computers Misunderstand the World (2019). This course will also examine the advocacy and activism work undertaken by groups like Our Data Bodies, Data 4 Black Lives, Algorithmic Justice League, Auditing Algorithms, Big Brother Watch, and Chicago-based Citizens Police Data Project. Coursework may include response papers, hands-on work, and a final project. Familiarity with statistical analysis, data science, or computer science tools and methods is NOT a prerequisite for this course.
MDSC 30159 Critical Internet Geographies (3 Credit Hours)
In 1996, John Perry Barlow's "A Declaration of the Independence of Cyperspace" framed "the frontiers of Cyberspace" as an apolitical, borderless space where "our identities have no bodies." The invention of the World Wide Web and the continued evolution of internet technologies have drastically changed the norms for communication and community formation. However, internet technologies and platforms have also amplified and magnified deeply-embedded structures of discrimination and oppression. From the Ku Klux Klan's creation of the Aryan Nations Liberty Net in the 1980s to the 21st-century #GamerGate campaign's targeted harassment of women in the video game industry. At the same time, internet technologies and platforms have supported and facilitated the work of activists, advocates, and grassroots organizers. This course moves beyond techno-optimism to critically examine the historical, cultural, social, and political significance of "the internet," from alternate internet histories to contemporary debates around regulation and access. Class meetings will be split between discussions of conceptual readings and applied work with internet technology systems. Readings for this course will draw on texts that include Janet Abbate's Inventing the Internet (1999), Lisa Nakamura and Peter Chow-White's Race After the Internet (2012), Safiya Umoja Noble's Algorithms of Oppression: How Search Engines Reinforce Racism (2018), Charlton McIlwain's Black Software: The Internet and Racial Justice, From the AfroNet to Black Lives Matter (2020), Marisa Elena Duarte's Network Sovereignty: Building the Internet Across Indian Country (2017), Jessie Daniels's Cyber Racism: White Supremacy Online and the New Attack on Civil Rights (2009), Andre Brock Jr.'s Distributed Blackness: African American Cybercultures (2020), and edited collections Race in Cyberspace (Routledge, 2010) and #HashtagActivism: Networks of Race and Gender Justice (MIT Press, 2020). Coursework may include response papers, hands-on work, and a final project. Familiarity with data science or computer science tools and methods is NOT a prerequisite for this course.
MDSC 30161 Football in America (3 Credit Hours)
Football is one of the most enduringly popular and significant cultural activities in the United States. Since the late 19th century, football has occupied an important place for those wishing to define and understand "America." And Notre Dame football plays a central role in that story, with larger-than-life figures and stories, from Knute Rocknes. Win one for the Gipper line to the Four Horsemen backfield that led the program to a second national championship in 1924. The mythic proportions of the University's football program cast a long shadow on the institution's history, cultural significance, and traditions. This course focuses on Notre Dame football history as an entry point into larger questions about the cultural, historical, and social significance of football in the U.S. Who has been allowed to play on what terms? How have events from Notre Dame football's past been remembered and re-imagined? How has success in Notre Dame football been defined and redefined? In particular, the course will focus on how Notre Dame football became a touchstone for Catholic communities and institutions across the country navigating the fraught terrain of immigration, whiteness, and religious practice. This course will take up those questions through significant engagement with University Archive collections related to Notre Dame football, working toward increased levels of description and access for these materials. This course will include hands-on work with metadata, encoding and markup, digitization, and digital preservation/access through a collaboration with the University Archives and the Navari Family Center for Digital Scholarship. Readings for this course will include chapters from texts such as Murray Sperber's Shake Down the Thunder: The Creation of Notre Dame Football (1993), TriStar Pictures' Rudy (1993), Steve Delsohn's Talking Irish: The Oral History of Notre Dame Football (2001), Jerry Barca's Unbeatable: Notre Dame's 1988 Championship and the Last Great College Football Season (2014), David Roediger's Working Toward Whiteness: How America's Immigrants Became White (2005), David Roediger's The Wages of Whiteness: Race and the Making of the American Working Class (1991), and Noel Ignatiev's How the Irish Became White (1995). Class meetings will be split between discussions of conceptual readings and applied work with library and information science technologies and systems. Coursework may include response papers, hands-on work with data, and a final project. Familiarity with archival methods, library/information science, data science, or computer science tools and methods is NOT a prerequisite for this course.
MDSC 30173 Video Games and the American West (3 Credit Hours)
Video Games of the American West will utilize digital games as the primary case studies to examine the modern cultural image, understanding of, and interaction with the "space" of the American West. This class will provide historical understandings of the vast, varied, and often mythologized history of the American West, as well as its place as a cite of continued colonial narratives and hegemonic imagery in contemporary popular media such as film, television, and video games. Through the close-playing of a variety of Western games including installations from the Call of Juarez series, Red Dead Redemption, series, Horizon, series, and many others, students will be asked to apply their knowledge of the historical and contemporary understandings and employments of the West as a physical space and a cultural space to the visual and mechanical recreations of it within the digital realm of video games.
MDSC 30190 Sport and Big Data (3 Credit Hours)
Sport is one of the most enduringly popular and significant cultural activities in the United States. Data has always been a central part of professional sport in the US, from Henry Chadwick's invention of the baseball box score in the 1850s to the National Football League's use of Wonderlic test scores to evaluate players. This course focuses on the intersecting structures of power and identity that shape how we make sense of the "datification" of professional sport. By focusing on the cultural significance of sport data, this course will put the datafication of sport in historical context and trace the ways the datafication of sport has impacted athletes, fans, media, and other stakeholders in the sport industry. The course will also delve into the technology systems used to collect and analyze sport data, from the TrackMan and PITCHf/x systems used in Major League Baseball to the National Football League's Next Gen Stats partnership to emerging computer vision and artificial intelligence research methods. Readings for this course will draw on texts like Christopher Phillips' Scouting and Scoring: How We Know What We Know About Baseball (2019), Ruha Benjamin's Captivating Technology: Race, Carceral Technoscience, and Liberatory Imagination in Everyday Life (2019), and Michael Lewis's Moneyball: The Art of Winning an Unfair Game (2004). Class meetings will be split between discussions of conceptual readings and applied work with sport data and technology systems. Coursework may include response papers, hands-on work with data, and a final project. Familiarity with statistical analysis, data science, or computer science tools and methods is NOT a prerequisite for this course.
MDSC 30411 Application, Ethics, and Governance of AI (3 Credit Hours)
The application of artificial intelligence is expanding rapidly and has the potential to reshape many fields, including transportation, finance, health care, marketing, social media, criminal justice, and public policy, just to name a few. AI's ability to predict human preference and behavior or even substitute human judgement in these fields creates opportunities as well as concerns for safety, bias and discrimination, transparency, inequality, and job loss. Designed to serve students from no background in AI to those who have existing technical background, this course surveys current and emerging applications of AI in different fields and the related ethical issues and governance problems. The course targets students from different disciplines. Students from the humanities and social sciences will gain a deeper understanding of the technical aspects underpinning today's ethical and policy debates related to AI. Students with more technical background will better appreciate the ethical issues that arise in programming and engineering and understand how technology interacts with the broader societal contexts. The course's goal is to encourage students to become proactive in thinking of the societal implications of technological change and to incorporate such understanding in their education and careers.
MDSC 30685 Introduction to Learning Analytics (3 Credit Hours)
The popularity of massive open online courses (MOOCs) and shifting to online learning during the COVID-19 pandemic have witnessed the power of learning analytics (LA) of using large-scale learning data to support the teaching and learning practices. Although promising milestones have been achieved, the widespread adoption of LA is still at its infancy stage. In this course, we will introduce the most current topics in LA including: What are the main concerns in the LA field? How do we build artificial intelligence (AI) models to identify patterns in historical learning data and make predictions about the future learning? How to use Text Mining approaches to analyze forum discussion data to track changes in student emotional status? What are the data visualization skills we should have to support the analytical processes and present results? Apart from this, we will also talk about the topics about ethical issues such as the unintentional discriminations from AI algorithms, trustworthy concerns in AI predictions as well as privacy concerns related to large availability of learning data. Students in this course will be engaged in multiple projects based on publicly available learning datasets with modularized python function blocks provided for the corresponding tasks.
MDSC 30705 Practical Data Visualization (3 Credit Hours)
Data visualization is about making the complex understandable. Whether this is a massive table of addresses, a relational database or simply a very large dataset, this class will help you use modern, interactive applications to effectively communicate trends in your data. You will craft a variety of visualizations for different audiences, work with some special forms of data (i.e. social networks, multivariate and spatial data), and you will experiment with a variety of different tools for creating data visualizations. This course is designed to give students a broad overview of the field of data visualization.
MDSC 30750 Generative AI in the Wild (3 Credit Hours)
Generative AI is a form of computing in which computer systems generate media such as text, images, sound, video, or combinations based on prompts or other information provided to the computer. These systems, including, but are not limited to, ChatGPT, Midjourney & DALLE, have been evolving rapidly and have led to extreme excitement, confusion, and fear. This course provides a survey of how to understand and use a number of these tools including explorations in prompt engineering as well as addressing issues from across the liberal arts including artistic, economic, social/psychological, educational and legal concerns and opportunities.
Satisfies the following University Core Requirements: WKIN - Core Integration
MDSC 30801 Language Processing in Practice (3 Credit Hours)
Natural Language Processing (NLP) has emerged as a crucial skill in the workforce, especially with the advent and accessibility of generative AI technologies. From intelligent chatbots and virtual assistants to automated content creation and sentiment analysis, NLP applications are transforming industries and redefining how we interact with technology. Mastery of NLP techniques and tools not only opens doors to careers in the technology sector but also equips students to contribute to innovations that shape our future.
Language Processing in Practice is a hands-on course designed to introduce students to the fundamental theory and applications of NLP, with a special emphasis on working with large language models, generative AI, and the Hugging Face ecosystem. The course focuses on practical techniques for processing, analyzing, generating, and understanding human language data.
Students will explore key topics such as text preprocessing, tokenization, part-of-speech tagging, parsing, sentiment analysis, topic modeling, machine translation, and text generation. The curriculum places a strong emphasis on modern NLP libraries and frameworks like NLTK, spaCy, and particularly Hugging Face Transformers. Through a series of projects and assignments, students will gain experience in building NLP applications, creating word embeddings with pre-trained large language models, and generating human-like text using generative AI models.
Basic proficiency in Python programming is required.
MDSC 30815 How to (Not) Lie with Statistics (3 Credit Hours)
Are stay-at-home orders effective during a pandemic? Should parents allow kids to have screen time? What role did demographic shifts play in the 2020 elections? Does the infield shift work? Modern society constantly faces questions that require data, statistics, and other empirical evidence to answer well. But the proliferation of niche media outlets, the rise of fake news, and the increase in academic research retraction makes navigating potential answers to these questions difficult. This course is designed to give students tools to confront this challenge by developing their statistical and information literacy skills. It will demonstrate how data and statistical analyses are susceptible to a wide variety of known and implicit biases, which may ultimately lead consumers of information to make problematic choices. The course will consider this issue from the perspectives of consumers of research as well as researchers themselves. We will discuss effective strategies for reading and interpreting quantitative research while considering the incentives researchers face in producing it. Ultimately, students will complete the class better equipped to evaluate empirical claims made by news outlets, social media, instructors, and their peers. The goal is to encourage students to approach data-driven answers to important questions with appropriate tools rather than blind acceptance or excessive skepticism.
MDSC 33201 Geographic Information Systems (3 Credit Hours)
This course is aimed to provide a basic understanding of how Geographic Information Systems (GIS) and satellite imagery can be used to visualize and analyze environmental data. Students will learn basic techniques for analyzing, manipulating and creating geospatial data in both pixel-based (satellite imagery and digital terrain models) and vector based (point, line and polygon representation of spatial data) formats. Students will also learn how acquire high resolution satellite imagery and other GIS data from online data servers.
MDSC 33450 Data Analytics and Economic Evaluation for Social Impact (3 Credit Hours)
Economics and data science are powerful tools that can improve people’s lives. This course will equip you to: (1) describe data, (2) use machine learning to predict the future, (3) establish cause and effect, and (4) address pressing social challenges using economic frameworks. There will be an emphasis on critical thinking – including the strengths and limitations of impact evaluation – and communicating research findings effectively with stakeholders. The course will feature guest speakers from public policy, tech, and economics. Focus areas will include: poverty alleviation, child protection, upward mobility, human-algorithm interaction, sports and culture, and other topics of student interest. Students will develop and present their own research project during the semester. Basic applied statistics background is recommended but can be waived with permission. This course is ideal for students hoping to see the big picture of an economics major, those in other disciplines passionate about the intersection of data analysis and social challenges, and those hoping to gain perspective on high-impact career opportunities.
Satisfies the following University Core Requirements: WRIT - Writing Intensive
MDSC 40120 Machine Learning for Social and Behavioral Research (3 Credit Hours)
In this day and age, we interact with many data collection tools. From swiping loyalty cards in the supermarket, movie recommendations by Netflix, or taking driving directions from a GPS, we are leaving a data footprint almost every day. Machine learning algorithms could help us go from raw datasets to valuable information. Machine learning has recently emerged as a major area of statistical research and is making its way into psychology. This course is an introductory seminar on the theory and application of machine learning to data analysis. A lot of research in psychology has focused on hypothesis-driven, explanatory approaches to data analysis. Machine learning could supplement a researcher’s analytic toolbox to explore patterns in datasets and assess the predictive value of various combinations of variables on several outcomes.
MDSC 40122 Machine Learning for Social and Behavioral Research (3 Credit Hours)
In this day and age, we interact with many data collection tools. From swiping loyalty cards in the supermarket, movie recommendations by Netflix, or taking driving directions from a GPS, we are leaving a data footprint almost every day. Machine learning algorithms could help us go from raw datasets to valuable information. Machine learning has recently emerged as a major area of statistical research and is making its way into psychology. This course is an introductory seminar on the theory and application of machine learning to data analysis. A lot of research in psychology has focused on hypothesis-driven, explanatory approaches to data analysis. Machine learning could supplement a researcher’s analytic toolbox to explore patterns in datasets and assess the predictive value of various combinations of variables on several outcomes.
MDSC 40124 Tests and Measures for Psychological Science (3 Credit Hours)
This course will demonstrate how it is possible to measure abstract psychological attributes in a principled way. The class will cover classical test theory (reliability and validity), item analysis and scaling, different types of tests (IQ, personnel, diagnostic), the development of new measures, and the basic statistical methods used for measurement. Issues related to fairness and measurement equivalence will be discussed.
MDSC 40211 Advanced Econometrics for Policy & Public Finance (3 Credit Hours)
The course covers the core methods necessary to read and conduct economic research using examples from the Public Finance literature. Students should have a good understanding of statistical inference and linear regression methods to be eligible. The course stresses the practical implementation of various econometric methodologies to analyze longitudinal datasets and "big data." Lectures will provide a comprehensive introduction to Randomized Controlled Trials (RCTs) and survey design, quasi-experimental research designs, non-parametric methods, Maximum Likelihood Estimation (MLE), the Generalized Method of Moments (GMM), time series econometrics, and modern machine learning. The course will also provide a refresher on hypothesis testing and model specification testing. Readings and practical problem sets will be posted each week to provide hands-on numerical experience to students.
MDSC 40410 Patterns of Life (3 Credit Hours)
This course focuses on the mathematical principles underlying the spatiotemporal patterns emerging in biological populations. Students are expected to be comfortable with calculus, differential equations, linear algebra, and elementary probability theory. The first part of the course focuses primarily on population genetics and evolutionary biology, while the second part will focus on reaction diffusion equations and pattern formation. Students will be expected to solve
quantitative problems, design simulations, and will be guided towards developing research
projects related to theoretical and computational biology.
MDSC 40427 The Epidemiology and Ecology of Infectious Diseases (3 Credit Hours)
This course provides an introduction to epidemiology and disease ecology; topics covered include historical perspectives on disease, tracking of disease, spread of disease, and disease mitigation.
MDSC 40647 Data Visualization (3 Credit Hours)
Introduction to scientific and information visualization. Topics include visualization of scalar and vector fields (isosurface extraction, volume rendering, line integral convolution, and particle tracing); visual data representations (parallel coordinates, treemaps, and graph layouts); interactive techniques (focus+context visualization and coordinated multiple views); and solutions for big data visual analytics. Students will gain hands-on experiences in learning popular visualization programming (D3.js) and toolkit (ParaView). Students will have the opportunity to learn, implement, and apply visualization techniques through assignments and projects.
MDSC 40810 Quantitative Political Analysis using Stata (3 Credit Hours)
"Students in this course will learn to understand the most common statistical techniques used in political science and acquire the skills necessary to use these techniques and interpret their results. A mastery of these techniques is essential for understanding research on public opinion and voting behavior, electoral studies, and comparative research on the causes of democracy. For each topic, students will read works to orient them to key issues and debates. They will learn the reasoning behind the statistical analysis in these readings and create their own spreadsheet programs to execute such analyses. They will then download and clean datasets actually used in the published research, replicate selected analyses from these readings using the statistical package Stata and write short papers evaluating the inferences defended in the published research."
MDSC 40811 Quantitative Political Analysis using R (3 Credit Hours)
This course is designed to achieve three objectives: (1) introduce you to research and quantitative analysis in political science, (2) help you become critical consumers of political information and policy-oriented reporting, and (3) give you the ability to answer questions of social scientific importance using data. Throughout the course, we'll discuss the complexities of generating good research designs, starting with how to ask interesting questions and how to measure concepts of interest to social scientists. Students in this course will learn to understand the most common statistical techniques used in political science and acquire the skills necessary to use these techniques and interpret their results. A mastery of these techniques is essential for understanding research on public opinion and voting behavior, electoral studies, and comparative research on the causes of democracy. The target audience for this course is undergraduate students with interest in the social sciences (not only Political Science), who want to use quantitative approaches to solve important problems, as well as develop marketable analytical skills.
MDSC 40815 Visualizing Politics (3 Credit Hours)
This course is an introduction to political, economic, and social issues through the medium of visual displays. This kind of course has become feasible because data are now abundant and easy to access and software for displaying and analyzing data are available and easy to use. The ability to examine and display data is an increasingly valuable skill in many fields. However, this skill must be complemented by the ability to interpret visual displays orally, and by a commitment to use data responsibly: to reveal, rather than slant or distort, the truth. We will discuss examples concerning drugs, marriage, climate change, development, economic performance, social policy, democracy, voting, public opinion, and conflict, but the main emphasis is on helping you explore many facets of an issue of particular interest to you. You will learn to manage data and produce your own graphics to describe and explain political, social, economic (or other!) relationships. The graphics will include line and bar graphs, 2D and 3D scatterplots, motion charts, maps, and others.
MDSC 43099 Unlocking Social Puzzles with Data: Digital Tools for Sociologists (3 Credit Hours)
From the cultural content we consume, through apartment hunting, to dating, more and more aspects of people’s lives nowadays unfold on digital platforms. The vast quantities of data generated, together with the availability of data analysis tools and the computational resources that power them, provide social scientists with exciting opportunities to make sense of social phenomena that, previously, were often impossible to capture. This course introduces advanced undergraduates to modern methods sociologists and other social scientists use to collect and analyze data at scale, with an emphasis on the analysis of text and of spatial data. Together, we will discuss and think through recent research that applies these methods to gain insight into how social processes operate. We will gain hands-on experience with the open-source R programming language, powering much of this research. Through in-class activities, assignments, and an independent research project, we will develop skills in data collection, wrangling, and analysis using R, aiming to uncover hidden trends and answer social puzzles. Prior experience with R, specifically, is not necessary, although some statistics or programming background is required.
MDSC 43202 Visualizing Spatial Data (2 Credit Hours)
This course covers making maps and analyzing spatial information. It will show you how to use modern software to create maps and dashboards for spatial data. You will learn about the basics of Geographic Information Systems (GIS) and how to use this tool to create the maps and visualizations you need for any project.
MDSC 43316 Sociotechnical Studies of Data Science (3 Credit Hours)
This course provides an introduction to the emergent field of social studies of data-intensive analytics for the examination of how "things are done with data." The goal is to cover a wide range of examples and practical applications to introduce questions of design and implementation, privacy and surveillance, as well as governance and stewardship of digital tools and infrastructures. Following the performative aspect of data, we will explore social, technical, political, and economic dynamics that involve data extraction, sharing, literacy, and analysis. From little to big data practices, we will examine at the interface level the professional and institutional applications, development histories, and current political economy of data to situate ourselves as engaged technologists and researchers, not detached critics or passive users. There are no prerequisites for this course: no previous experience in statistics or programming is needed, but independent study of the supplementary materials we provide is highly encouraged.
MDSC 43402 Population Dynamics (3 Credit Hours)
Demography, the science of population, is concerned with virtually everything that influences, or can be influenced by, population size, distribution, processes, structure, or characteristics. This course pays particular attention to the causes and consequences of population change. Changes in fertility, mortality, migration, technology, lifestyle, and culture have dramatically affected the United States and the other nations of the world. These changes have implications for a number of areas: hunger, the spread of illness and disease, environmental degradation, health services, household formation, the labor force, marriage and divorce, care for the elderly, birth control, poverty, urbanization, business marketing strategies, and political power. An understanding of these is important as business, government, and individuals attempt to deal with the demands of the changing population.
MDSC 43919 Text Analysis for Social Science (3 Credit Hours)
Screens are all around us. From T.V.s to smartphones and e-books, the ubiquity of screens and
the fact that we use them to communicate with one another means that virtually all of us
create some form of "text data" every day.
Further, the proliferation of mass communication technologies over the past couple of
decades - including the rise of social media, the emphasis on document digitization in archives,
libraries, and organizations, and increasing access to these data - has opened the door to new
questions for social scientists and to new data and methods for answering these questions. For
example, do anti-immigration laws shape how people tweet about immigration? Does war
shape how U.S. presidents frame the role of governance in society, as reflected in State of the
Union addresses? What accounts for the gender gap in net neutrality activism? Did national
news media or activist social media matter more for sparking #BlackLivesMatter? Can Twitter
sentiment predict stock market activity?
This course will introduce students to some of the methods that social scientists use to answer
these types of questions. The focus will be on understanding and developing some of the
fundamentals for designing and conducting text analysis projects from a social science
perspective. We will also touch on some of the more advanced topics in this rapidly growing
field. Hands-on analysis in the R statistical computing environment will be integral to the
course, though no prior coding experience is required.
MDSC 43990 Social Networks (3 Credit Hours)
Social networks are an increasingly important form of social organization. Social networks help to link persons with friends, families, co-workers and formal organizations. Via social networks information flows, support is given and received, trust is built, resources are exchanged, and interpersonal influence is exerted. Rather than being static, social networks are dynamic entities. They change as people form and dissolve social ties to others during the life course. Social networks have always been an important part of social life: in our kinship relations, our friendships, at work, in business, in our communities and voluntary associations, in politics, in schools, and in markets. Our awareness of and ability to study social networks has increased dramatically with the advent of social media and new communication tools through which people interact with others. Through email, texting, Facebook, Twitter and other platforms, people connect and communicate with others and leave behind traces of those interactions. This provides a rich source of data that we can use to better understand our connections to each other; how these connections vary across persons and change over time; and the impact that they have on our behaviors, attitudes, and tastes. This course will introduce students to (1) important substantive issues about, and empirical research on, social networks; (2) theories about network evolution and network effects on behavior; and (3) tools and methods that students can use to look at and analyze social networks. The course will be a combination of lectures, discussions and labs. Course readings will include substantive research studies, theoretical writings, and methodological texts. Through this course students will learn about social networks by collecting data on social networks and analyzing that data.
MDSC 48000 Directed Research (3 Credit Hours)
Directed Research in Data Science offers students a chance to engage in hands-on research, either by working on a faculty member's research project or by pursuing one's own research question unrelated to a senior thesis project. By the end of this course, students should demonstrate a deepened sense of empiricism and methodological understanding. This is a graded course, and a formal application is required. (See the DUS for a copy.) Students engaged in a faculty member's research project should work out a study plan and evaluation process for assigning a final grade with the faculty member. Students engaged in their own research project should (1) submit their research questions, hypotheses, data source, and methodology to their faculty director at the time of application to the course, and (2) submit a written research report by the end of the semester, as part of the final evaluation process. Department Approval Required.
MDSC 48009 iTreds Capstone Experience (1-3 Credit Hours)
The senior capstone project provides the iTREDS scholars the opportunity to work in interdisciplinary teams and with external stakeholders on topics that require data science fundamentals and application. iTREDS scholars co-create their project with a stakeholder partner (industry or community-based/regional or not-for-profit organization).
Course may be repeated.