Skip to content

This repository contains an analysis of programming language preferences among developers based on age groups and experience levels, utilizing data from the StackOverflow Developer Survey.

License

Notifications You must be signed in to change notification settings

TrueCodee/StackOverflow-Developer-Survey-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Programming Language Preferences Analysis

Overview

This project investigates how programming language preferences vary across different age groups and experience levels among developers, based on the StackOverflow Developer Survey data. Our findings contribute to understanding the evolving software development landscape and the changing skill requirements for developers over time.

Contents

  1. Introduction: Outlines the project's objectives, focusing on the analysis of programming language preferences among developers of different ages and experience levels.
  2. Data Retrieval & Cleaning: Details the process of obtaining the Stack Overflow dataset, including the importation of necessary libraries and initial data cleaning steps.
  3. Data Processing & EDA (Exploratory Data Analysis): Describes the preprocessing steps and the exploratory analysis conducted to understand the distribution of programming language preferences among different demographic groups.
  4. Model Application: Discusses the application of logistic regression, random forest, and AdaBoost models to infer and predict programming language preferences based on demographic and professional data.
  5. Synthesis of Inference and Prediction: Provides insights from the model findings, highlighting how age, experience, and other factors influence programming language preferences.
  6. Conclusion: Summarizes the project's findings and their implications for educational strategies, community support initiatives, and future research directions.

Datasets

The analysis is based on datasets provided by Stack Overflow, including user profiles, questions, answers, comments, and tag information. These datasets are publicly available through Stack Overflow's annual data release.

Technologies Used

  1. Python
  2. Jupyter Notebook
  3. Pandas
  4. Matplotlib
  5. Seaborn
  6. Scikit-Learn
  7. Statsmodels
  8. Imbalanced-Learn

Results

Our analysis reveals significant insights into programming language preferences across different demographic segments. Detailed findings and visualizations are available in the Jupyter Notebook included in this repository.

Future Work

Future directions involve extending the analysis to incorporate more variables, employing longitudinal data, and considering the impacts of emerging technologies on developer preferences.

How to Run

  1. Clone the repository
  2. Install dependencies listed above
  3. Run Jupyter Notebook to start Jupyter
  4. Open the Final_Project_Analysis.ipynb notebook
  5. Execute the cells in order to reproduce the analysis
  6. Alternatively, view Final_Project_Analysis.html for a static HTML export of the notebook.

Contributors

  1. Aryan Jain
  2. Devesh Talreja
  3. Micah Billington
  4. Rupesh Rangwani

License

This project is licensed under the MIT License. See the LICENSE file for details

About

This repository contains an analysis of programming language preferences among developers based on age groups and experience levels, utilizing data from the StackOverflow Developer Survey.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •