Navigating the Python Development Ecosystem: The Essential Tools and Techniques - An opinionated guide

This article is highly opinionated. These are the tools and workflows that I use (as of 2023). Thus, this is not a list of the best tools out there nor a list of the best way to learn. This is a list that has worked for me and is subject to change.

Context

I was trying to help someone I know get started in Python (and Data Science). He already did some courses in Data Science (and thus has at least a basic understanding), but I realized all the surrounding tools were not explained in the courses he took.

Once I started, I found myself explaining several tools, systems, and apps I use daily. After a couple of hours, I thought it would be good to document this as a series of posts, as this is not the first time, nor the last time, I will be explaining such things.

Read more… (estimated reading time: 17 min)

(Quick-note) SSH Keys Permissions

Context

You want to add a SSH Key to your SSH Agent and you get an error Permissions are too open.

 ssh-add ~/.ssh/id_rsa  
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @  
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  
Permissions 0644 for '/home/daco/.ssh/id_rsa' are too open.  
It is required that your private key files are NOT accessible by others.  
This private key will be ignored.

Solution

If your keys need to be only readable by you:

chmod 400 ~/.ssh/id_rsa

If your keys need to be read-writable by you:

chmod 600 ~/.ssh/id_rsa

After that you can add your key

ssh-add ~/.ssh/id_rsa

Could not open connection to your authentication agent

If you get this error

 ssh-add ~/.ssh/id_rsa  
Could not open a connection to your authentication agent.

just eval your agent

eval `ssh-agent`

and then proceed to add your keys.

Reference

How to deal with dates, dates with timezones and dates with particular formats with python and pandas (code snippets for different use cases)

Context

When dealing with dates, I’ve sometimes problems because the source is not clean, or not all the rows have the same format. Additionally, dates can be simple (year-month-day) or really complicate like a timestamp with timezone.

Here are some code snippets for several use cases.

How to read dates with python and pandas

# import pandas
import pandas as pd

# df is a dataframe with a column 'column_with_date' with a date like this '19.01.2023 16:45:46 +01:00'
# convert date string to datetime with pd.to_datetime
pd.to_datetime(df['column_with_date'])

# sometimes the format is a bit weird and pandas cannot recognize it. In this case we give the date format as argument.
# for this format 19.01.2023 16:45:46 +01:00 we can use:
pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z')

# if your string has timezone, use utr=True
pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z', utc=True)

# sometimes your columns are as objects (strings) and numbers (floats, ints) and to_datetime cannot process it. 
# You can force the type string to the whole column before giving it to to_datetime.

pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)

# pd.to_datetime does not modify the column values in place, so you have to assign it to the same column.

df['column_with_date'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)

# or save it to another (new) column if you want to save the original value
df['column_with_date_2'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)

How to save dates with UTC (tiemzone) to an Excel-File with Pandas

Saving datetime columns with timezones in pandas is not supported, and you will get the following error if you try:

ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel.

To remove the timezone from a datefield (column dtype datetime64[ns, UTC]) you can use .dt.date()

# remove timezones for excel
df['column_with_date'] = df['column_with_date'].apply(lambda d: pd.to_datetime(d).date()) 

# save te file as usual
df.to_excel('filename.xlsx')

References

  • https://stackoverflow.com/a/63238008/624088
  • https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes -> format codes for datetime strings
  • try and error :)

How to set up fingerprint authentication on openSuse Tumbleweed and KDE Plasma

Context

I have an old Toshiba Notebook which has a Fingerprint device and I wanted to use it to authenticate myself in KDE on openSuse. Similar steps should work on GNOME and other Distros. At the end of the article you can find a reference for KDE running on debian-based distros.

Check if your fingerprint reader device is supported

Run lsusb and check for any device with fingerprint.

(ds) daco@toshiba:~> lsusb  
Bus 004 Device 003: ID 8086:0189 Intel Corp. Centrino Advanced-N 6230 Bluetooth adapter  
Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub  
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub  
Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub  
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub  
Bus 001 Device 020: ID 0930:1314 Toshiba Corp. F5521gw  
Bus 001 Device 004: ID 0bda:58f5 Realtek Semiconductor Corp. 2SF001  
Bus 001 Device 018: ID 08ff:168b AuthenTec, Inc. Fingerprint Sensor
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub  
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

If you find any, copy the ID (in my case is 08ff:168b) and check if it is supported in https://fprint.freedesktop.org/supported-devices.html

if you find it, then follow the next steps 😃

Install requirements

fprintd and fprintd-pam

Use YaST Software or the following command

sudo zypper install fprintd fprintd-pam

Register your fingerprints In KDE

Configure your system (window-manager and KDE lock Screen) to use and allow fingerprint for authentication

Window Manager

kdesu kate /etc/pam.d/sddm

add the following after #%PAM-1.0

auth    [success=1 new_authtok_reqd=1 default=ignore]   pam_unix.so try_first_pass likeauth nullok
auth    sufficient      pam_fprintd.so

KDE Lock screen

# this will edit or create a file
kdesu kate /etc/pam.d/kde

and add the following

auth            sufficient      pam_unix.so try_first_pass likeauth nullok
auth            sufficient      pam_fprintd.so

and restart sddm

sudo service sddm restart

now you should be able to use your fingerprint to login. Check it out by locking KDE (or press win + l). There should be a message to use fingerprint or password to unlock/ login.

follow: https://en.opensuse.org/SDB:Using_fingerprint_authentication

References

How to add a table of contents (TOC) to a markdown post or page to nikola ssg

You want to add an automatically created table of content to your markdown post in Nikola SSG.

Nikola uses by default python-markdown which supports the toc extension.

First you need to enable the markdown extension in config.py

MARKDOWN_EXTENSIONS = ['markdown.extensions.toc']

Then add the short code for TOC in your markdown file

[TOC]

References

How to find all processes with a specific string in their name and kill them all (in linux command line using ps, grep and awk)

Context

You want to find all the processes of, for example, visual studio code, and kill them all for the command line in linux.

To find all the processes ids for a particular app you can use the following:

Find all the processes for a particular app

ps -d | grep app_name | awk '{print $1}'

For example to find all the processes for visual studio code (process name is code, you would type

ps -d | grep code | awk '{print $1}'

Kill all the processes for that particular app

If you want to kill all those processes, just prepend kill and use the ids as arguments

kill $(ps -d | grep code | awk '{print $1}')

References

Business Toolkits for Innovation, Management, and Solving Problems

/images/business-toolkits-for-innovation-management-and-solving-problems.jpg

There are many good and bad concepts, areas, books, methodologies and tools in the business world and, after a while, one starts to forget them. Among those, some of which I find useful and/ or interesting are:

  • Innovation,

  • Marketing,

  • Digital Marketing,

  • Market Research,

  • Business Strategy,

  • Blue Ocean Strategy,

  • Six Thinking Hats,

  • ReWork,

  • Business Model Canvas.

Sometimes some of these toolkits are really hard to understand and use, while others give you moments like "How didn't I think of this before!".

Whichever the case, there are those which one should at least know of to be able to speak the same language as others in the industry.

"If the only tool you have is a hammer, it is tempting to treat everything as if it were a nail."

Law of the instrument - Abraham Maslow

I'm also a hard believer of the "Law of the instrument" by Abraham Maslow which states, paraphrasing, "if you have just a hammer, everything looks like a nail", so I try to add items to my mental repository (and my notes) regularly.

That said, it is important to highlight that there are some of these toolkits that are designed for early-stage projects, others for startups raising investment and many more for big companies which have to develop its business and improve daily.

Each of these toolkits assumes particular situations and paradigms and were developed to support people and businesses in those situations. If you want to apply them to other situations and paradigms, you have to put the required additional effort to understand and adapt the toolkit.

If you don't understand under which paradigms a toolkit was developed, you are likely to try to apply it in a less-than-adequate way, which may complicate things instead of simplifying them.

Some evolved to be applied to other areas or industries (such as Lean Manufacturing), while others were simply designed for specific use-cases. Some are multi-faceted and multi-situational toolkits and deliver or explain ways of thinking and approaching problems (e.g. market research methodology or the 6 Thinking Hats), while others describe a series of steps to obtain a particular result (e.g. the Business Model Canvas gives an overview of a project/ business at one point in time). If you don't understand under which paradigms a toolkit was developed, you are likely to try to apply it in a less-than-adequate way, which may complicate things instead of simplifying them.

You can of course mix them and use whatever you want from each of them (which is what I always recommend), but for that, you have to first:

  1. know they exist and,

  2. understand them so that you can adapt the useful parts for your use-case.

Introducing new business toolkits into a team (or organization)

When trying a new toolkit in an organization or in your team, there will always be those in favor and those against it. Don't take it personally. If you want to try a new toolkit it is part of your job to understand why it could not work or why could not apply and also your responsibility to not use the toolkit if it makes more problems than those it solves.

Given that there are almost always other people involved, chances are you will have a situation where there is a strong opposition. There are many reasons why this would be the case, and I will try to list those which I have found, by my experience, to be critical when I have tried to introduce a new toolkit:

  1. You are working with people who hate business and marketing. This is more normal and less discussed than you think. There are many which see the whole business world as work for someone else. I've seen this mostly with hardcore engineers, or some really technical people, This is not bad at all if you understand this before trying to implement a new tool or methodology. The opposite is also true and for some (most) business people all technology-related stuff is not their problem)

  2. The company you are working for/ with has a way which has always been there, which means there is a strong opposition to anything which doesn't align with the *normal way of doing stuff*.

  3. No one has time to learn yet another tool or methodology, especially if the normal way "works".

  4. You are getting into the area or work of someone else without speaking with them first. They may or may not understand what you are trying to use or what you are endeavoring to achieve. In this case, bringing them into the project at the beginning and consulting with them how this new toolkit might improve things works wonders.

  5. You are trying to do something no one asked you for. Yes, sometimes you just get the (inner)call to fix or improve something, or you just like to swim against the flow.

I've found that knowing and using these toolkits has helped me get things done (on projects, endeavors, and daily work), and I'm always trying to add (learn) new toolkits to my repository and modify/ mix methodologies and tools.

This article is the first from a series of articles (or at least I hope so) where I will try to summarize and explain such tools and methodologies along with some of my personal notes. Why? Because it helps me remember them and because I like to :smiley:

Here is a list of the articles I'm planning to write or already written. If there is any toolkit I missed, please let me know.

Other articles of this series:

And these are the articles I plan to write:

  • SRI's 5 disciplines of Innovation

  • Blue Ocean Strategy

  • Business Model Canvas

  • 6 Thinking Hats

  • Nudge (book)

  • Market research methodologies

  • Design Thinking

  • How to do meetings (according to my experience)

  • Diffusion of innovations (by Everett Rogers) and how it impacts you.

  • Productivity tools and methodologies: a review and comparison according to my experience

  • The art of innovation (by Guy Kawasaki)

  • Getting things done and Zen to do

  • Elevator Pitch

  • The 5 Stages of Customer Awareness and its impact in marketing

  • Love marks

  • Making Ideas Happen

What is Git (version control system) and why you need to use it for your projects (any project with files and content, not just code)

When you work on something for more than a couple of hours and across several days or months, you are bound to try new things, ways of writing, code, images and so on. And while doing so, you will probably start creating many files with different names for the same content so that you can have versions, such as Text-1.txt, texts-1-1.txt, project-a-final.docx, project-a-final-2.docx, and so on. If you have a more structured mindset you may even use a time stamp on your file names in the form of project-a-2022-11-12-1211.docx or similar. After a couple of tries, you have tens or even hundreds of files and you won’t even know which one was the correct one. This gets even more complicated if you also want to collaborate with others (and not depend on Google Docs (or similar) or if your project cannot be done in such tools).

Here comes a version control system (aka as VCS) to help. Instead of manually creating files and giving them a name, saving them somewhere and tying to find out which one you really wanted to use, or which parts of which files, you my as well use a vcs, in this case git, to manage the whole thing. If you use it the right way, you will even have a nice history of your changes.

A VCS, also sometimes known as a SCM (source code management) or RCS (revision construal system) is a system which tracks changes to a file or a set o files. Usually, developers use a VCS to track changes to code and collaborate with others, but software development is not the only use case.

Types of version control systems

There are several kinds of VCS’s, and they are categorized in centralized and distributed.

Centralized VCS

A centralized VCS works on a client-server model, where a centralized master code base is in a server, and a developer or collaborator can lock (check-out) a piece of this code (version) and work on it. In this case everything is controlled by the server.

The best known examples are CVS and Subversion (open source) and IBM ClearCase (commercial).

Distributed VCS

A distributed VCS works on a peer-to-peer model, where a code are or project is distributed among the individual developer’s devices and the entire history as well as the versions are mirrored on each system.

In this case the emphasis is on changes instead of versions, so that any version (branch) is a combination of many sets of changes (commits).

The main examples here are Git and Mercurial.

In fact the slogan from git is everything is local, high emphasizes the distributed part.

Git (the system we will use)

We will discuss Git, which is a distributed version control system.

Git (/ɡɪt/)[8] is free and open source software for distributed version control: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. -Wikipedia

What is git in simple words?

Git is a version control system (VCS) which allows you to track changes to a file or set of files over time.

In other words, Git is a tool or system that allows you to add files to a tracked repository (a folder which was initialized or configured as a Git Repository), commit changes to the repository with a message stating which changes were made and, optionally, push those changes to a centralized server to collaborate with others and pull them to get the changes others made.

Within this repository you can have different branches (as the branches of a tree), each of which can have a specific version (combination of changes) of a file or set of files. This means you can have a main or master branch (think of this as the tree trunk) from which several branches (or none) can grow (or be created).

The commits and its messages build the log, journal or history of a repository and each of its branches.

Usually what you want is to have as little branches as possible, and as many branches as projects, ideas or issues in which you are working on.

You can think of it like the branches of a bush or garden tree. The more the branches grow and in more directions, the more complicated it is to prune the tree and maintain a shape. The same goes for repositories. The more branches you have the more the branches differ from each other, making it more complicated to get back to the master branch.

Each branch will grow or stagnate independently, and the way to get the changes from one branch to another is to merge them. You usually merge two branches using a merge request, where you can check the differences and changes to each file between two branches.

Once you are ok with the merge request you can approve the merge, meaning you will mix (merge) the content of each file into a new version of the file on the destination branch (usually master or develop) or reject the merge request (aka MR) and just discard the changes. You may even remove the whole branch or leave it there for historic purpose.

Use cases

As explained before, a vcs is a way to track files (not only code) and you can use it for almost anything (videos and really big files might be a bad idea though, although there is a solution for that called Git Large File Storage or Git LFS).

I have used git to track the changes of several books written in markdown and asciidoc, this site (written in markdown, restructured text and python), as well as personal and professional programming projects, notes and even be used as an alternative to Dropbox (check git-annex. There are ways for designers to use git to track changes to designs, parsers for office files, and many others.

The main point here is that git is not only for tracking code.

How to use Git

You may use git on the command line, a desktop graphical user interface or a web-based management platform. There are several to choose from, but the most popular ones are:

  • GitHub (saas, now part of Microsoft)
  • GitLab (saas and open source)
  • Gitea (open source, self-hosted)
  • Codeberg (gitea as non-profit saas, backed by non-profit Codeberg e.V. in Germany)
  • BitBucket (commercial)

The web-based management platforms also add several features which are not part of git (or any vcs) but are really useful (mostly for software projects but also for other kinds of projects).

For example they usually include:

  • Issue tracking (integrated with git)
  • Continuous Development/ Continuous Integration
  • Merge request and comments
  • Approval or rejection of merge requests
  • ACLs or access control lists for repositories based on roles and permissions
  • A web editor for code with syntax-highlighting

Depending on which system you use to create and manage your repository the way to do the same may differ and the tool itself is really flexible.

Important: You don’t need any web platform to use git, as it can run completely local.

Git on the command line

There are many commands in git, but with the following you can get started:

# initialize a repository (folder)
git init

# add files for the repository
git add file

# add all files in this folder to the repository
git add .

# commit changes for the added files
git commit -m “short message describing the changes”

Git on GitHub

Go to https://github.com and create a new account to get started.

Git on GitHub

Go to https://gitlab.com an create a new account to get started.

Git workflow: GitFlow

There are many ways of using git (workflows or branching strategies) which one could follow but the most popular ones are

There are already excellent comparisons which you can refer to if you want to learn more. (Check https://www.devbridge.com/articles/branching-strategies-git-flow-vs-trunk-based-development/ and https://www.flagship.io/git-branching-strategies/.

I personally use Git-Flow.

Article updates

  • 2022-12-23: added links for git web platforms, added syntax highlighting for bash and links for git workflows and social share image

How to back up your Mac config and apps from Terminal (using brew, conda, pip and mackup)

When you are using you Mac daily you begin to use several apps almost without thinking. But when you have to migrate to new Mac (or Linux) you realize how many apps and custom settings are you missing.

Context

I want to back up my Mac settings and be able to use or restore them in the same or another Mac. I mean Mac because:

  1. brew is mainly used in Mac, but also available in Linux and WSL) and
  2. mackup is mainly for macOS (but should also support Linux)

We will back up the following tool sets:

  • brew packages as a list of installed packages
  • conda environments
  • all apps supported by mackup
  • pip packages

Brew

Export apps to a Brewfile

brew bundle dump # dump the app list to the current directory

brew bundle dump --file=~/Brewfile # dump the app list to a specific location 

Import (install) apps from a Brewfile

brew bundle --file=~/.private/Brewfile

Conda

Export 1 (one) conda environment (env)

To export just one environment you can do the following:

conda activate my_env
conda env export > my_env.yml
conda deactivate

Export all conda environments (env) in one command

But usually you have several environments (at least I do) and you would want to export all of them at the same time. In that case, you may do the following source

for env in $(conda env list | cut -d" " -f1); do 
   if [[ ${env:0:1} == "#" ]] ; then continue; fi;
   conda env export -n $env > ${env}.yml
done

Create a conda env from a YAML file

This will create a .yml file with the name of the env for each of the envs you have.

To create an environment from a .yml file

conda env create -f environment.yml

Mackup

mackup backup # to backup
mackup restore # to restore

PIP

To export all the dependencies in a python environment

pip freeze > requirements.txt

tT import (or install) all the dependencies from a text-file

pip install -f requirements.txt

References

How to remove a network with active endpoints in Docker

Problem: ERROR: error while removing network

I wanted to run docker-compose down but it failed due to the following error:

ERROR: error while removing network: network <your_network> id cfcb4a603426f2cf71b1f971a9ecb0aae7e6c889a8dc4c55bfd1eb010d8a260b has active endpoints

Solution

To solve this just run docker network inspect <your_network> with the rights permissions. (You may also use sudo if you have the permission).

The output of that command is something like this:

[
    {
        "Name": "your_network",
        "Id": "cfcb4a603426f2cf71b1f971a9ecb0aae7e6c889a8dc4c55bfd1eb010d8a260b",
        "Created": "2021-05-05T10:58:08.216143067+02:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.38.0/24",
                    "Gateway": "10.0.38.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "cf8f158e34d07bb3be8ff73e21fc688dce1ba13a5b941a7a59aff1373a74be8f": {
                "Name": "phpmyadmin_phpmyadmin_1",
                "EndpointID": "e77d58d75ee31a37ee2cced2658c36480c4f209d8eedf29d4d281f8073457eb1",
                "MacAddress": "02:42:0a:00:26:05",
                "IPv4Address": "10.0.38.5/24",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {
            "com.docker.compose.network": "default",
            "com.docker.compose.project": "your_project",
            "com.docker.compose.version": "1.28.5"
        }
    }
]

In this case the network is your_network and the endpoint is in containers -> name.

Now you can use sudo docker network disconnect -f your_network phpmyadmin_phpmyadmin_1. This will disconnect the network from the instance, and you will be able to run docker-compose down without problems.

Source: This answer in Stack Overflow