Creativity in Programming 公开

Theodore, William (Spring 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/zg64tn650?locale=zh
Published

Abstract

Creative thinking is a valuable skill in professional and academic settings. Being able to quantitatively define and measure creativity is a fundamental step toward helping students improve it. However, in the context of computer programming, effectively measuring creativity is still an open problem.

In this paper, we present a framework based on clustering to assess the creativity of computer programmers from the code they wrote. In particular, we focus on measuring three dimensions of creativity: (1) originality, i.e., how much an individual programmer's solution differs from other solutions to the same problem written by other programmers; (2) fluency, i.e., how many solutions can one programmer produce; and (3) flexibility, i.e., how many substantially different solutions to the same problem an individual programmer is able to write.

We evaluate these dimensions of creativity using a machine-learning model that transforms computer programs into code embeddings, which are real-valued vectors summarizing the semantics of a program in an abstract set of features. We use these embeddings to cluster programs into semantically similar solution types. The distance between a solution and the cluster centers can provide a measure of originality. When we have access to multiple solutions by the same programmer, we can evaluate flexibility by determining the number of clusters the solutions belong to.

We evaluate this approach using a preexisting dataset and new experimental data. The distribution of results and resulting originality scores are generally consistent with theoretical predictions. We also compare student-generated code with AI-generated code from OpenAI's ChatGPT, one of the most popular large language models. The AI-generated programs tended to have higher originality scores than students in our dataset.

Finally, we conducted an experiment with students in the secondary Computer Science course at Emory. These students solve a single problem repeatedly, allowing for measures of flexibility to be assessed. We find a lot of variation within the students' results that generally follow expectations. When evaluating the system against human graders, we found moderate agreement, demonstrating the viability of the system. 

Table of Contents

1. Introduction ......................................................................................................................1

2. Background .......................................................................................................................2

2.1. Studying .....................................................................................................................2

2.2. Measuring Creativity in Programming ..........................................................................4

2.3. Word embeddings and code embeddings .......................................................................5

3. Approach ..........................................................................................................................7

3.1. Method Overview and Definitions ................................................................................7

3.2. Data ..........................................................................................................................9

3.2.1. CS170 Dataset .....................................................................................................9

3.2.2. AI-generated Programs .......................................................................................10

. 3.2.3. Experimental Dataset .........................................................................................11

3.2.4. Human-assessment Dataset .................................................................................11

3.3. Preprocessing ............................................................................................................12

3.4. Evaluation of Clustering Methods ...............................................................................13

3.5. Validation of Originality Scores on 170 Dataset ............................................................13

3.6. Repeated Solutions Experiment ..................................................................................14

3.7. Comparison with Human Analysis ................................................................................14

4. Experiments .....................................................................................................................16

4.1. Evaluation of Clustering Methods ................................................................................16

4.1.1 Clustering a single question .................................................................................. 19

4.2. Validation of Originality Scores on 170 Dataset ............................................................. 21

4.3. Repeated Solutions Experiment ................................................................................... 22

4.3.1. Clustering ............................................................................................................22

4.3.2. Student Analysis .................................................................................................. 23

4.4. Validation against human graders .......................................................................... ......26

5. Analysis ............................................................................................................................28

5.1. High AI Originality ......................................................................................................28

5.2. Accuracy vs Originality ................................................................................................29

5.3. Agreement with Human Graders ...................................................................................30

5.4. Applications ................................................................................................................31

6. Conclusion .........................................................................................................................33

6.1. Limitations .................................................................................................................33

6.2. Conclusion ..................................................................................................................34

A. Chat-GPT Transcripts.........................................................................................................35

A.1. Simple solution .......................................................................................................... 35

A.2. Creative solutions .......................................................................................................36

B. Repeated Solutions Experiment Problem .............................................................................39

Bibliography .........................................................................................................................41 

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
关键词
Committee Chair / Thesis Advisor
Committee Members
最新修改

Primary PDF

Supplemental Files