Navigating the Python Development Ecosystem: The Essential Tools and Techniques - An opinionated guide

This article is highly opinionated. These are the tools and workflows that I use (as of 2023). Thus, this is not a list of the best tools out there nor a list of the best way to learn. This is a list that has worked for me and is subject to change.

Context

I was trying to help someone I know get started in Python (and Data Science). He already did some courses in Data Science (and thus has at least a basic understanding), but I realized all the surrounding tools were not explained in the courses he took.

Once I started, I found myself explaining several tools, systems, and apps I use daily. After a couple of hours, I thought it would be good to document this as a series of posts, as this is not the first time, nor the last time, I will be explaining such things.

Base operating system

In my opinion, the most important tool out there. It allows you to interact with your computer and limits (so to speak) what you can and can´t do with it.

There are 3 main operating systems:

  • Microsoft Windows

  • Apple macOS

  • GNU/Linux distributions

If you want to understand the ins and outs of everything, I really recommend GNU/Linux. It is the main operating system in servers, in containers (check the table of ocntents of this article), in Android and many other places. It runs almost everywhere, and need a lot less resources than any other option.

To get started I recommend Linux Mint, Ubuntu Linux and openSUSE Tumbleweed (in that order).

I use openSUSE Tumbleweed because it is stable and a rolling release, Linux Mint on the Desktop because it just works, and Ubuntu and Debian on servers. I own a Mac but now I rarily use it, and I have a dual-boot with windows just in case I need it.

If you want something that just works, but still is compatible with Linux (and its terminal), you should go with macOS (if your budget allows).

If you are in a Microsoft environment (meaning you depend on OneDrive, Microsoft Excel with advanced macros, devices that only run on Windows and so on) you should go with windows. For development in Windows 11 you should try WSL2, which is a way to run Linux in Windows.

Tools for developers

Let’s start with the tools I use.

Git

The first one is git. I already wrote an article about git here: What is Git (version control system) and why you need to use it for your projects (any project with files and content, not just code). If you don’t know what git is, I would recommend you to read it.

What is git?
Git is a version control system (VCS) which allows you to track changes to a file or set of files over time.

Why is this great, important and cool? It allows you to collaborate on projects with other people, to track different versions, to un-do mistakes, to see the code history, to try new things without breaking everything. Just read the article or check out the Git Documentation

Development workspace

A development workspace is an environment that provides all the necessary tools and dependencies required to build, test, and deploy applications, hopefully in a way that is versionable and replicable.

To create a development workspace, one can use Virtual Machines (VMs), Containers, or LXC (Linux Containers). VMs are a good choice for creating isolated environments that can run different operating systems, while containers provide a lightweight and portable solution that can be easily shared across multiple machines. LXC, on the other hand, provides a way to create and manage system-level containers that can run multiple isolated Linux systems on a single host.

Let’s check the main ones.

VMs (Virtual Machines or Containers)

Virtual machines are another way to isolate a development environment (additionally or as an alternative to Conda and virtualenv).

Virtual machines are usually those where you can install a whole operating system. The most common ones are:

From those, XEN, KVM, Virtualbox and QEMU are open-source projects, while VMWare is a commercial product.

Virtualbox comes with a CLI and a GUI and works under Linux, Mac, and Windows.

To use XEN, KVM and QEMU you need to use the terminal or an additional GUI, such as virt-manager (libvirt) or a shell tool such as Virsh (also libvirt) under Linux.

There are a couple of important concepts you should know:

  • base operating system: is your main operating system (OS). On a Mac is typically macOS, on most desktops and notebooks is Windows and on some others is a Linux distribution (such as Ubuntu, Debian, Fedora, openSUSE, Manjaro, Linux Mint, etc…).

  • guest operating system: the system you are installing on the virtual machine. For example, you can install Ubuntu (Linux).

  • extensions: a way to integrate your base and guest operating systems to make their use easier. For example, you can sync the clipboards, mount directories, share files, record the screen and so on.

Virtualbox

If you want to try different operating systems, Linux distributions, or you wish to have setups or workspaces for specific projects, then Virtualbox is the easiest way to do that.

While there are alternatives out there, VirtualBox runs almost everywhere and is pretty well integrated into the base OS.

LXC, Docker or Podman

Linux containers, Docker containers, and Podman containers are always to package software applications and their dependencies into self-contained units (thus containers, because they contain something) that can be easily shared, run, and managed across different computing environments. They provide a more efficient and reliable way to deploy and manage software applications, while also improving security and portability.

In other words, is a way to not mix different versions of packages and libraries in your base OS. This allows you to develop and test software using different versions (for example, Python 2.7 and Python 3.10), without installing those specific versions in your system. Because it is self-contained, you can share these environments and other (when correctly configured) can use them without many problems.

In a way, this is similar to Conda environments.

From the following, the most used alternative right now is docker (together with docker-compose)

Containerization Technology Supported Operating Systems URL Short Description Main Company Intercompatibility

LXC

Linux

https://linuxcontainers.org/lxc/

LXC is an OS-level virtualization method that allows running multiple isolated Linux systems (containers) on a single Linux host. It provides a way to manage containers through a daemon, which can be controlled using the LXC command-line tools.

Canonical

Compatible with Docker images.

Docker

Linux, Windows, macOS

https://www.docker.com/

Docker is a platform for building, shipping, and running distributed applications in containers. It provides a way to create, deploy, and manage containers using a client-server architecture, where the Docker daemon manages the containers and the Docker CLI controls the daemon.

Docker, Inc.

Compatible with Podman and Kubernetes.

Podman

Linux

https://podman.io/

Podman is a daemonless container engine that allows running containers as regular users without requiring a daemon running as a privileged process. It provides a command-line interface for managing containers and images, and can run Docker images as well.

Red Hat

Compatible with Docker images and Kubernetes.

Terminal/ command line/ shell

As you dive into programming, you will find yourself using the terminal more and more.

The terms terminal, command line and shell refer in this article to the same thing: a text-based interface to a computer (local or remote), where you type in commands and get text-based responses, rather than clicking on icons and windows with a graphical user interface.

There are many reasons to use the terminal, but you will eventually find that using the command line for some tasks is easier and faster.

For example, you can:

  • navigate your file system

  • use docker

  • use ssh to access remote machines

  • file operations (copy, move, remove, sync, etc…​)

By default, in most GNU/Linux systems you will have bash (new versions of mac and some Linux-distributions have zsh and window uses cmd or powershell).

Pump up my shell

A bare shell is enough at first, but you may improve your shell experience with some apps and plugins.

There are mainly 3 ways to improve your shell:

  1. Install zsh (another shell similar to bash but newer) and install some add-ons such as ohmyzsh, zinit or similar.

  2. Install another shell such as fish

  3. Install a universal app to improve your shell such as starship.

I have user ohmyzsh and zinit on zsh with great results, but because bash is almost everywhere by default, I started to use starship.

Here is a quick comparison table of the three:

Shell Description URL Stars Last Update OS Support Features

zsh + ohmyzsh

Oh My Zsh is a community-driven framework for managing your Zsh configuration. It comes bundled with several helpful functions, helpers, plugins, and themes that make it easy to customize your terminal experience.

https://github.com/ohmyzsh/ohmyzsh

GitHub stars

GitHub last commit

Windows, Linux, macOS

Syntax highlighting, Auto-suggestion, Git aliases, Plugin manager

bash + starship

Starship is a minimal, blazing-fast, and infinitely customizable prompt for any shell. It provides useful information about the current directory, git branch and status, as well as customizable prompts and icons.

https://starship.rs/

GitHub stars

GitHub last commit

Windows, Linux, macOS

Syntax highlighting, Auto-suggestion, Git integration, Customizable prompt

fish + ohmyfish

Oh My Fish is a fast, fully-featured framework for the fish shell. It includes a wide range of plugins and themes that can be easily installed and customized. Fish is designed to be simple and easy to use, with a clean syntax and powerful features.

https://github.com/oh-my-fish/oh-my-fish

GitHub stars

GitHub last commit

Windows, Linux, macOS

Autosuggestions, Abbreviations, Syntax highlighting, Plugin manager, Theme manager

Tools for python

Virtual environments

A virtual environment is a way to isolate a specific version of python and a mix of modules in specific versions.

Why is this important?

Python and python packages (for example pandas, requests, etc.) are always in development, and they add, remove or change some features.

Let’s take pandas as an example: Before pandas 1.5 the best way to get a row/column pair was with df.loc`[1] or `df.iloc. After pandas v1.5 the proper way is to use df.at.[2]

With virtual environments, you can run your project with pandas 1.4, for example, while you update your code to work with pandas 1.5 (in another virtual environment).

The virtual environment is just the part which isolates it from the base system (in this case, the python packages from your main operating system). Additionally, you need a way to add packages and modules. This is called package manager.

In python 2 usual ones are Conda and pip.

Pip is generally used together with virtualenv.

Conda includes its own way to manager virtual environments, and may also use pip for the packages not available in its main package repository or in conda-forge.

I use both of them depending on what I’m doing.

For example, after installing Conda you can create an environment env and add packages for something. In this example, I will create an environment called ds (for data science) and add pandas to it.

# create en environment in conda

conda create --name ds

# activate environment

conda activate ds

# install pandas and jupyter lab from conda-forge

conda install -c conda-forge jupyterlab pandas

# now you can run jupyter lab
jupyter lab

In comparison with virtualenv, you can activate a Conda environment from any path.

There are 4 main differences between Conda and virtualenv:

  1. Creation of a virtual environment: For virtualenv you need to first activate the environment using the path to it and then use pip to install packages. Every time you want to activate this virtual environment, you have to give the path (of course you could create shell aliases and stuff like that, but I find Conda to be easier).

  2. Activation of the virtual environment: Another big difference is that you have to give virtualenv a path where it should create the new environment, while Conda uses the same root folder for all (in Linux, ~/miniconda3 by default).

  3. Different commands in different operating systems Conda always has the same way of creating and activating the virtual environment. On the other hand, with virtualenv you have to change the way you activate it depending on the operating system you are using. Check this link for more information

  4. Python and other packages: in Conda you can install more than just python packages, and it also has pre-compiled python packages (bin). For example, in Conda you can also install R-packages and R.

The same example as before, but using virtualenv, is:

# create a new virtual environment
python3 -m venv /path/to/new/virtual/environment

# activate environment, where <venv> is the path/to/new/virtual/environment
source <venv>/bin/activate # in bash/zsh
source <venv>/bin/activate.fish # in fish
<venv>/bin/Activate.ps1 # in PowerShell
<venv>\Scripts\activate.bat # in cmd.exe

# install jupyter lab and pandas
pip install jupyterlab pandas

# now you can run jupyter lab
jupyter lab

miniconda

What is miniconda?
Miniconda is a free minimal installer for Conda. It is a small, bootstrap version of Anaconda that includes only Conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others.
— [https://docs.conda.io/en/latest/miniconda.html]

virtualenv/ venv

virtualenv is a tool to create isolated Python environments. Since Python 3.3, a subset of it has been integrated into the standard library under the venv module. The venv module does not offer all features of this library, to name just a few more prominent.
— https://docs.python.org/3/library/venv.html

You use venv or virtualenv usually for deployment, as it is supported almost everywhere. For example, to deploy a python-based static site to GitHub Pages, GitLab Pages or Cloudflare Pages, you will need a requirements.txt, which is the main file of a venv, where all the dependencies are listed.

poetry

I tried poetry, but I stopped. I don’t really remember why, but I’m ok for now using conda, virtualenv and docker.

I might check it again in the future.

If you want to check it out, visit https://python-poetry.org/

IDEs (Integrated Development environments)

An IDE is an application that provides a comprehensive and integrated set of tools for software development (although I also use them for writing this blog because of git integration, autocompletion and syntax-highlighting). It is typically used by programmers and developers to write, test, and debug software code more efficiently and effectively.

An IDE typically includes

  1. a code editor that provides features such as

    1. syntax highlighting,

    2. code completion, and

    3. automatic indentation

to help developers write code more quickly and accurately.

It may also include a debugger that allows developers to step through their code and identify and fix errors more easily.

Other features of an IDE may include

  1. version control integration,

  2. command-line integration

  3. project management tools, and

  4. support for multiple programming languages.

Lately, some of them also include AI Add-ons such as GitHub Co-Pilot.

Using AI to code is cool, but has its dangers. The code does not always work, nor does it do what you want. I wouldn’t recommend using such tools unless you can fix the code or, at least, understand what it does. Don’t just blindly trust auto-generated code. You have been warned.

A list of IDEs for Python is available at Wikipedia

I use these 3 IDEs depending on the task on hand. To try new things I go with Jupyter Lab, for big projects (and at work) and commit to git I go with PyCharm and for everything else, including remote coding from an iPhone, I go with VS Code. You don’t have to decide on one, but it helps if you have at least the same keybindings.

PyCharm

PyCharm is an integrated development environment (IDE) used for programming in Python. It provides code analysis, a graphical debugger, an integrated unit tester, integration with version control systems, and supports web development with Django. PyCharm is developed by the Czech company JetBrains.
— Wikipedia

I have to say, that the integration of Git in JetBrains products (I also use PHPStorm) is the best I have seen.

PyCharm works really good and has everything you would want to use, but is a little heavy on ressorce-use (in comparison with vscode, for example).

There are 2 versions:

  1. Community edition

  2. Commercial edition

The main differences are in the quantity of programming languages it supports, the integrated framework support (for example Django) and remote interpreters. Of course, besides price.

If you are new to this, the community version will suffice for your needs.

You can find a comparison here at JetBrains

Visual Studio Code (vscode or code)

Visual Studio Code is an Open-Source project from Microsoft. It really runs everywhere and has a lot of extensions. It has built-in git support, IntelliSense[3]

You can install it on Linux, Windows, Mac, a headless server and access it through a browser, is integrated in GitHub Code Spaces, on the web and in many other places.

If you want a lightweight and customizable IDE, but at the same time flexible and powerful, git VSCode a try.

Jupyter Notebooks/ Jupyter Labs

Jupyter Notebook/ Lab is a web-based interactive IDE, mostly used in data science projects. Its main “work unit” is a Jupyter Notebook (called document) and allows you to get inline results. You can see the results of part of your code without running all of it.

JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality.
— jupyter.org

It’s important to note that Jupyter supports many programming languages through kernels. You can use, as far as I know, C++, Julia, Octave, R, Ruby, Python, and PHP. There are many more kernels available, as it is an open-source and community-driven project. Here you can find a list of supported kernels.

Key takeaways

  1. Operating systems (OS) allow you to interact with your computer and limit what you can do with it. The main OS are Microsoft Windows, Apple macOS, and GNU/Linux.

  2. Git is a version control system that allows you to track changes to a file or set of files over time. It’s essential for collaboration on projects, tracking different versions, undoing mistakes, and trying new things.

  3. Development workspaces provide all the necessary tools and dependencies required to build, test, and deploy applications. They can be created using Virtual Machines (VMs), Containers, or LXC (Linux Containers).

  4. Virtualbox is an easy way to try different OS and setups. Containers are a way to package software applications and their dependencies into self-contained units that can be easily shared, run, and managed across different computing environments.

  5. LXC, Docker, and Podman are popular containerization technologies. LXC allows running multiple isolated Linux systems on a single Linux host, while Docker is a platform for building, shipping, and running distributed applications in containers. Podman is a daemonless container engine that allows running containers as regular users without requiring a daemon running as a privileged process.


3. Go beyond syntax highlighting and autocomplete with IntelliSense, which provides smart completions based on variable types, function definitions, and imported modules. (https://code.visualstudio.com/)

Comments

Comments powered by Disqus