This repository builds a learning path to Data Science and Machine Learning. It primarily focusses on the below tasks which forms the base of any Data Science and Machine Learning project.
- Gathering or Extracting the data
- Exploring the Data, popularly called Exploratory Data Analysis
- Data Preparation
- Data Visualizations
- Model building, Training, Evaluation, Tuning and Outcome
A prior knowledge of Python as programming language is expected to proceed with this learning. If not, then visit
Since this is a learning phase for me in Data Science, all the contents are build with Concepts, Examples and Problem Solving Exercises. The contents touched upon are:
- NumPy, for Numerical Computation and Analysis
- Pandas, for Data extraction and Preparation
- Matplotlib and Seaborn, for Data Visualizations
We require Python and the following Python libraries installed:
We will also need to have software installed to run and execute a Jupyter Notebook
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included.
The jupyter notebook files can be run in the below loosely order.
- NumPy
- Pandas
- Matplotlib
In a terminal or command window, navigate to the package specific project directory (say: NumPy,Pandas,Matplotlib...) `and run one of the following commands:
ipython notebook <file_name>.ipynb
or
jupyter notebook <file_name>.ipynb
This will open the Jupyter Notebook software and project file in your browser.
Multiple datasets are involved in implementing the different facets of Data Science and ML. All the required datasets are available at Data Repository.
This will always be a Work in Progress.