1 Getting started

1.1 What is a programming language

A programming language is a way to use letters and symbols to describe a computational process. Like every language we approach it at two levels: syntax, the grammar of the language, which lays down which combinations of letters and symbols are to be accepted as valid texts in the languages, and semantics, the rules and conventions which determine the meaning to be derived from the texts. Compared to human languages, computer languages are much more nitpicky with regard to both these aspects.

A text in a programming language — a program — describes a computational process. Most often, our aim in writing a program is to actually have a computer carry out the process described. Software which makes this happen is called a programming language implementation. We will not go into how the implementations work. The Python programming language has multiple implementation. We will be using the most common one, which you can download from python.org, called CPython. Given the particular way this implementation works, it is called the Python interpreter.

1.2 Installing Python

To install Python on your system:

1.2.1 On Windows:

Visit python.org and navigate to the Downloads section.
Select the latest stable version for Windows and download the installer.
Run the installer and ensure you check the box “Add Python to PATH” before proceeding.
Follow the installation prompts and complete the setup.

1.2.2 On Mac:

Visit python.org and go to the Downloads section.
Select the latest stable version for macOS and download the package.
Open the downloaded package and follow the installation steps.
Verify the installation by opening a terminal and typing python3 --version.

1.3 Interactive Mode vs Script Mode

Python can be used in two main ways:

Interactive Mode: Typing commands one at a time and seeing immediate results. This is useful for experimenting, testing small code snippets, or learning Python.
Script Mode: Writing complete programs in files (scripts) that can be run later. This is how most Python programs are written and is essential for creating reusable code.

1.4 The REPL

The Python REPL (Read-Eval-Print Loop) is an interactive environment where you can type Python expressions and see the results immediately. It is called a Read-Eval-Print Loop because it performs three main steps: it reads your input, evaluates the expression or code, and then prints the result. The interaction is prompted by the >>> symbol in the terminal, which indicates that the REPL is ready to accept input. To start the REPL:

Open a terminal (or Command Prompt on Windows).
Type python (or python3 on some systems) and press Enter.

For example, try typing the following expression and pressing Enter:

2 + 2

The REPL will immediately evaluate the expression and display the result:

You can use the REPL for quick calculations or experimenting with Python syntax. However, certain things can go wrong while using it. For instance, if you start typing a line of code but forget to complete it, the REPL will wait for further input, showing an ellipsis (...) prompt. To exit this incomplete state, press Ctrl+C to terminate the current input and return to the >>> prompt.

Additionally, errors in your code will result in Python displaying error messages, known as tracebacks, which provide details about what went wrong and where. For example, if you type print(2 / 0), Python will display a ZeroDivisionError indicating that division by zero is not allowed. These messages help diagnose and fix issues in your code.

The REPL also maintains a history of your previous inputs, which you can navigate using the up and down arrow keys. Pressing the up arrow key cycles through earlier commands, allowing you to quickly reuse or edit them without retyping. The down arrow key lets you move forward through the history if you’ve gone too far back.

To quit the REPL, you can type exit() or quit() and press Enter. Alternatively, you can press Ctrl+D (on Mac or Linux) or Ctrl+Z followed by Enter (on Windows) to exit. This will return you to your system’s terminal or command prompt.

1.5 Virtual Environments

The REPL is not suitable for longer or more complex pieces of code, especially those you need to reuse or maintain over time. This limitation arises because the REPL does not save the code you write, making it impractical for tasks requiring iteration or debugging. Moreover, editing code in the REPL is cumbersome, as you cannot easily navigate or modify previous inputs.

For projects that require organization and reusability, it is recommended to create a dedicated directory and set up a virtual environment. Python packages—collections of reusable code—are often required to extend Python’s functionality. These packages are installed using a package manager like pip. However, different projects might need different versions of the same package. A virtual environment provides an isolated space for each project, ensuring that dependencies and package versions do not conflict across projects.

1.5.1 Creating a Directory and Setting Up a Virtual Environment

1.5.2 On Windows:

Open a terminal or Command Prompt. To do this, click on the Start menu, type “PowerShell,” and select “Windows PowerShell” from the search results. PowerShell is recommended for its modern features and compatibility with various commands.
Navigate to the location where you want to create your project directory, e.g., Documents.
Create a new directory:
```
mkdir my_project
cd my_project
```
Create a virtual environment:
```
python -m venv venv
```

1.5.3 On Mac:

Open a terminal. To do this, use Spotlight by pressing Command + Space, typing “Terminal,” and hitting Enter.
Navigate to the location for your project directory, e.g., Documents.
Create a new directory:
```
mkdir my_project
cd my_project
```
Create a virtual environment:
```
python3 -m venv venv
```

To activate the virtual environment:

Windows:
```
venv\Scripts\activate
```
Mac:
```
source venv/bin/activate
```

Once activated, the terminal prompt will change to indicate you are working within the virtual environment. To deactivate, simply type:

deactivate

1.6 Jupyter Notebooks

Jupyter Notebooks provide an interactive environment for writing and running Python code. They are ideal for exploratory data analysis and presentations.

Jupyter Notebooks embody the concept of literate programming, which interleaves code, explanations, and results, including visualizations, in a single file. This makes them particularly effective for documenting your computational process in a clear and reproducible manner. Each notebook file allows you to write Python code in cells, add formatted text using Markdown, and display outputs like tables, graphs, and images right alongside your code and explanations. To use Jupyter Notebooks, you need to install it in each virtual environment, but only once per environment.

1.6.1 Activating Your Virtual Environment

Before using Jupyter, activate the virtual environment where it is installed. This ensures all dependencies and packages are properly loaded.

1.6.2 Installing Jupyter Notebook

To install Jupyter Notebook, use the pip package manager. In your virtual environment, run the following command:

pip install notebook

This will install the necessary packages to run Jupyter Notebook in your environment.

1.6.3 Launching Jupyter Notebook

To start Jupyter Notebook, run the following command:

jupyter notebook

This will open the Jupyter interface in your default web browser. The interface displays a file tree showing the contents of the directory where you executed the command, allowing easy navigation of files and folders.

1.6.4 Starting a New Notebook

From the Jupyter interface, start a new notebook by clicking on “New” > “Python 3 (ipykernel)”. This creates a fresh notebook where you can begin coding. The notebook interface consists of: - Cells: Editable sections for writing code or text. - Menu and Toolbar: Options for saving, running code, adding cells, and more. - Output Area: Displays results, including text, plots, or tables, below executed cells.

You can rename your notebook by clicking on its name (default is “Untitled”) at the top of the window, typing a new name, and pressing Enter.

1.6.5 Using Cells

Jupyter organizes content into cells, which can contain code, text, or markdown. To execute code in a cell: - Press Shift + Enter to run the cell and move to the next one. - Press Ctrl + Enter to run the cell without moving to the next one.

To insert a new cell, use the Insert menu or press B. To delete a cell, use the Edit menu or press D twice (DD). If you’re in edit mode within a cell, press Esc to switch to command mode before using these shortcuts.

1.6.6 Running Code in a Notebook

The Run menu provides additional options for executing code. You can: - Run all cells in the notebook. - Run all cells above or below a selected cell. This flexibility allows you to test specific parts of your code without restarting the entire notebook.

1.6.7 Saving Your Work

Jupyter includes an autosave feature that saves your progress every two minutes by default. The save state is displayed at the top of the interface, indicating if there are unsaved changes. To manually save your work, press Ctrl+S (Windows) or Command+S (Mac).

Notebooks are saved as .ipynb files, which can be easily shared and reopened in Jupyter Notebook.

1.6.8 Adding Markdown and Text

Jupyter supports Markdown for adding formatted text, headers, lists, and links. This feature is useful for documenting code and creating explanatory notes alongside your computations. For a Markdown tutorial, refer to this guide.

1.7 Google Colab

Google Colab (short for Colaboratory) is a cloud-based Jupyter notebook environment that requires no setup and runs entirely in the cloud. It’s particularly valuable for machine learning and data science tasks because it provides free access to computing resources, including Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs).

1.7.1 Key Benefits of Google Colab

Free GPU Access: One of the most significant advantages of Colab is free access to NVIDIA GPUs, which can dramatically speed up machine learning computations, especially deep learning tasks. Without Colab, you would need to purchase expensive GPU hardware or pay for cloud computing services.
No Setup Required: Unlike local Jupyter installations, Colab requires no setup or configuration. You only need a Google account to start using it. All necessary Python packages for data science and machine learning (like TensorFlow, PyTorch, and scikit-learn) are pre-installed.
Cloud Storage Integration: Colab integrates seamlessly with Google Drive, making it easy to save your notebooks and access your data files. You can also connect to other cloud storage services.
Collaboration: Multiple people can work on the same notebook simultaneously, similar to Google Docs, making it excellent for team projects or educational settings.

1.7.2 How to Use Google Colab

Accessing Colab:
- Visit colab.research.google.com
- Sign in with your Google account
- Click “New Notebook” to start
Working with GPUs:
- To enable GPU acceleration, go to Runtime → Change runtime type
- Select “GPU” from the Hardware accelerator dropdown
- Your code will now utilize the GPU for compatible operations
Managing Packages:
- Common machine learning packages (TensorFlow, PyTorch, scikit-learn, pandas, numpy) are pre-installed
- Install additional packages using the !pip command in a code cell:
```
!pip install package_name
```
- After installing new packages, you may need to restart the runtime for them to work properly
- Package installations are temporary and need to be reinstalled if the runtime resets
Important Considerations:
- Colab sessions have time limits and will disconnect after extended periods of inactivity
- GPU resources are shared and may not always be available
- Your runtime will reset when disconnected, so save important variables and reload necessary data
- Storage is temporary; save important files to Google Drive

1.7.3 When to Use Colab vs Local Jupyter

Use Colab when:
- You need GPU acceleration for machine learning tasks
- You want to avoid complex local setup
- You’re working on collaborative projects
- You’re learning or prototyping and don’t need persistent computing resources
Stick to local Jupyter when:
- You need consistent, uninterrupted access to computing resources
- You’re working with sensitive data that shouldn’t be uploaded to the cloud
- You need specific package versions or custom environments
- You require longer running times without interruption

For our machine learning exercises, we’ll primarily use Google Colab to take advantage of its GPU acceleration capabilities, which will significantly speed up our model training processes.

# Getting started ## What is a programming language A programming language is a way to use letters and symbols to describe a computational process. Like every language we approach it at two levels: *syntax*, the grammar of the language, which lays down which combinations of letters and symbols are to be accepted as valid texts in the languages, and *semantics*, the rules and conventions which determine the meaning to be derived from the texts. Compared to human languages, computer languages are much more nitpicky with regard to both these aspects. A text in a programming language --- a program --- describes a computational process. Most often, our aim in writing a program is to actually have a computer carry out the process described. Software which makes this happen is called a *programming language implementation*. We will not go into how the implementations work. The Python programming language has multiple implementation. We will be using the most common one, which you can download from python.org, called CPython. Given the particular way this implementation works, it is called the *Python interpreter*. ## Installing Python To install Python on your system: ### On Windows: 1. Visit [python.org](https://www.python.org/) and navigate to the Downloads section. 2. Select the latest stable version for Windows and download the installer. 3. Run the installer and ensure you check the box "Add Python to PATH" before proceeding. 4. Follow the installation prompts and complete the setup. ### On Mac: 1. Visit [python.org](https://www.python.org/) and go to the Downloads section. 2. Select the latest stable version for macOS and download the package. 3. Open the downloaded package and follow the installation steps. 4. Verify the installation by opening a terminal and typing `python3 --version`. ## Interactive Mode vs Script Mode Python can be used in two main ways: 1. **Interactive Mode**: Typing commands one at a time and seeing immediate results. This is useful for experimenting, testing small code snippets, or learning Python. 2. **Script Mode**: Writing complete programs in files (scripts) that can be run later. This is how most Python programs are written and is essential for creating reusable code. ## The REPL The Python REPL (Read-Eval-Print Loop) is an interactive environment where you can type Python expressions and see the results immediately. It is called a Read-Eval-Print Loop because it performs three main steps: it **reads** your input, **evaluates** the expression or code, and then **prints** the result. The interaction is prompted by the `>>>` symbol in the terminal, which indicates that the REPL is ready to accept input. To start the REPL: 1. Open a terminal (or Command Prompt on Windows). 2. Type `python` (or `python3` on some systems) and press Enter. For example, try typing the following expression and pressing Enter: ```python 2 + 2 ``` The REPL will immediately evaluate the expression and display the result: ``` 4 ``` You can use the REPL for quick calculations or experimenting with Python syntax. However, certain things can go wrong while using it. For instance, if you start typing a line of code but forget to complete it, the REPL will wait for further input, showing an ellipsis (`...`) prompt. To exit this incomplete state, press `Ctrl+C` to terminate the current input and return to the `>>>` prompt. Additionally, errors in your code will result in Python displaying error messages, known as tracebacks, which provide details about what went wrong and where. For example, if you type `print(2 / 0)`, Python will display a `ZeroDivisionError` indicating that division by zero is not allowed. These messages help diagnose and fix issues in your code. The REPL also maintains a history of your previous inputs, which you can navigate using the up and down arrow keys. Pressing the up arrow key cycles through earlier commands, allowing you to quickly reuse or edit them without retyping. The down arrow key lets you move forward through the history if you've gone too far back. To quit the REPL, you can type `exit()` or `quit()` and press Enter. Alternatively, you can press `Ctrl+D` (on Mac or Linux) or `Ctrl+Z` followed by Enter (on Windows) to exit. This will return you to your system's terminal or command prompt. ## Virtual Environments The REPL is not suitable for longer or more complex pieces of code, especially those you need to reuse or maintain over time. This limitation arises because the REPL does not save the code you write, making it impractical for tasks requiring iteration or debugging. Moreover, editing code in the REPL is cumbersome, as you cannot easily navigate or modify previous inputs. For projects that require organization and reusability, it is recommended to create a dedicated directory and set up a virtual environment. Python packages—collections of reusable code—are often required to extend Python's functionality. These packages are installed using a package manager like `pip`. However, different projects might need different versions of the same package. A virtual environment provides an isolated space for each project, ensuring that dependencies and package versions do not conflict across projects. ### Creating a Directory and Setting Up a Virtual Environment ### On Windows: 1. Open a terminal or Command Prompt. To do this, click on the Start menu, type "PowerShell," and select "Windows PowerShell" from the search results. PowerShell is recommended for its modern features and compatibility with various commands. 2. Navigate to the location where you want to create your project directory, e.g., `Documents`. 3. Create a new directory: ```cmd mkdir my_project cd my_project ``` 4. Create a virtual environment: ```cmd python -m venv venv ``` ### On Mac: 1. Open a terminal. To do this, use Spotlight by pressing `Command + Space`, typing "Terminal," and hitting Enter. 2. Navigate to the location for your project directory, e.g., `Documents`. 3. Create a new directory: ```bash mkdir my_project cd my_project ``` 4. Create a virtual environment: ```bash python3 -m venv venv ``` To activate the virtual environment: - **Windows**: ```cmd venv\Scripts\activate ``` - **Mac**: ```bash source venv/bin/activate ``` Once activated, the terminal prompt will change to indicate you are working within the virtual environment. To deactivate, simply type: ```bash deactivate ``` ## Jupyter Notebooks Jupyter Notebooks provide an interactive environment for writing and running Python code. They are ideal for exploratory data analysis and presentations. Jupyter Notebooks embody the concept of literate programming, which interleaves code, explanations, and results, including visualizations, in a single file. This makes them particularly effective for documenting your computational process in a clear and reproducible manner. Each notebook file allows you to write Python code in cells, add formatted text using Markdown, and display outputs like tables, graphs, and images right alongside your code and explanations. To use Jupyter Notebooks, you need to install it in each virtual environment, but only once per environment. ### Activating Your Virtual Environment Before using Jupyter, activate the virtual environment where it is installed. This ensures all dependencies and packages are properly loaded. ### Installing Jupyter Notebook To install Jupyter Notebook, use the `pip` package manager. In your virtual environment, run the following command: ```bash pip install notebook ``` This will install the necessary packages to run Jupyter Notebook in your environment. ### Launching Jupyter Notebook To start Jupyter Notebook, run the following command: ```bash jupyter notebook ``` This will open the Jupyter interface in your default web browser. The interface displays a file tree showing the contents of the directory where you executed the command, allowing easy navigation of files and folders. ### Starting a New Notebook From the Jupyter interface, start a new notebook by clicking on "New" > "Python 3 (ipykernel)". This creates a fresh notebook where you can begin coding. The notebook interface consists of: - **Cells**: Editable sections for writing code or text. - **Menu and Toolbar**: Options for saving, running code, adding cells, and more. - **Output Area**: Displays results, including text, plots, or tables, below executed cells. You can rename your notebook by clicking on its name (default is "Untitled") at the top of the window, typing a new name, and pressing Enter. ### Using Cells Jupyter organizes content into cells, which can contain code, text, or markdown. To execute code in a cell: - Press `Shift + Enter` to run the cell and move to the next one. - Press `Ctrl + Enter` to run the cell without moving to the next one. To insert a new cell, use the `Insert` menu or press `B`. To delete a cell, use the `Edit` menu or press `D` twice (`DD`). If you're in edit mode within a cell, press `Esc` to switch to command mode before using these shortcuts. ### Running Code in a Notebook The `Run` menu provides additional options for executing code. You can: - Run all cells in the notebook. - Run all cells above or below a selected cell. This flexibility allows you to test specific parts of your code without restarting the entire notebook. ### Saving Your Work Jupyter includes an autosave feature that saves your progress every two minutes by default. The save state is displayed at the top of the interface, indicating if there are unsaved changes. To manually save your work, press `Ctrl+S` (Windows) or `Command+S` (Mac). Notebooks are saved as `.ipynb` files, which can be easily shared and reopened in Jupyter Notebook. ### Adding Markdown and Text Jupyter supports Markdown for adding formatted text, headers, lists, and links. This feature is useful for documenting code and creating explanatory notes alongside your computations. For a Markdown tutorial, refer to [this guide](https://www.markdownguide.org/). ## Google Colab Google Colab (short for Colaboratory) is a cloud-based Jupyter notebook environment that requires no setup and runs entirely in the cloud. It's particularly valuable for machine learning and data science tasks because it provides free access to computing resources, including Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs). ### Key Benefits of Google Colab 1. **Free GPU Access**: One of the most significant advantages of Colab is free access to NVIDIA GPUs, which can dramatically speed up machine learning computations, especially deep learning tasks. Without Colab, you would need to purchase expensive GPU hardware or pay for cloud computing services. 2. **No Setup Required**: Unlike local Jupyter installations, Colab requires no setup or configuration. You only need a Google account to start using it. All necessary Python packages for data science and machine learning (like TensorFlow, PyTorch, and scikit-learn) are pre-installed. 3. **Cloud Storage Integration**: Colab integrates seamlessly with Google Drive, making it easy to save your notebooks and access your data files. You can also connect to other cloud storage services. 4. **Collaboration**: Multiple people can work on the same notebook simultaneously, similar to Google Docs, making it excellent for team projects or educational settings. ### How to Use Google Colab 1. **Accessing Colab**: - Visit [colab.research.google.com](https://colab.research.google.com) - Sign in with your Google account - Click "New Notebook" to start 2. **Working with GPUs**: - To enable GPU acceleration, go to Runtime → Change runtime type - Select "GPU" from the Hardware accelerator dropdown - Your code will now utilize the GPU for compatible operations 3. **Managing Packages**: - Common machine learning packages (TensorFlow, PyTorch, scikit-learn, pandas, numpy) are pre-installed - Install additional packages using the `!pip` command in a code cell: ```python !pip install package_name ``` - After installing new packages, you may need to restart the runtime for them to work properly - Package installations are temporary and need to be reinstalled if the runtime resets 4. **Important Considerations**: - Colab sessions have time limits and will disconnect after extended periods of inactivity - GPU resources are shared and may not always be available - Your runtime will reset when disconnected, so save important variables and reload necessary data - Storage is temporary; save important files to Google Drive ### When to Use Colab vs Local Jupyter - Use Colab when: - You need GPU acceleration for machine learning tasks - You want to avoid complex local setup - You're working on collaborative projects - You're learning or prototyping and don't need persistent computing resources - Stick to local Jupyter when: - You need consistent, uninterrupted access to computing resources - You're working with sensitive data that shouldn't be uploaded to the cloud - You need specific package versions or custom environments - You require longer running times without interruption For our machine learning exercises, we'll primarily use Google Colab to take advantage of its GPU acceleration capabilities, which will significantly speed up our model training processes.