Intro to Databases: Understanding OLAP, OLTP, Embeddable, and Standalone

TLDR

Databases are essential if you work with computers, varying in type and purpose. OLAP and OLTP represent different data processing approaches, while embeddable and standalone databases differ in their integration with applications. This article serves as an introductory guide to these concepts.

Introduction to Databases

If you have ever worked with computers, you have probably heard of databases. They are the (usually correct) way for storing and managing data efficiently. While some (of us) still use Spreadsheets as a database, there are many types of actual databases, each with its own purpose and advantages.

Databases are used in almost every application, from simple blogs running WordPress to complex ERPs. They are used in banking, e-commerce, and even in your smartphone.

That said, databases are pivotal in storing and managing data efficiently if you work with data. With various types available, understanding the basics, like OLAP, OLTP, embeddable, and standalone databases, is important, but regrettably not enough, for choosing the right database for your needs. but we need to start somewhere.

Why this article?

A few weeks ago I found out about DuckDB, an embeddable SQL OLAP database that can be used with Python (and many other). I was curious about its positioning as an OLAP database, and I found a mention to this paper in Reddit where I found this graphic:

Systems Landscape from the paper "DuckDB: an Embeddable Analytical Database"

And I thought interesting to dive a bit deeper into the concepts of OLAP, OLTP, embeddable, and standalone databases.

So here we are.

What is OLAP?

OLAP (Online Analytical Processing) is a type of database designed for complex queries, data analysis, and reporting. It's often used in business intelligence and data warehousing, allowing users to analyze data from multiple perspectives.

OLAP is an approach to answer multidimensional analytical (MDA) queries swiftly in computing. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas. OLAP tools enable users to analyze multidimensional data interactively from multiple perspectives.

-- Automatic Summary from Wikipedia using this script

An example of an OLAP database is a data warehouse that stores historical data from multiple sources. It allows users to analyze data from different perspectives, like sales by region, product, or time.

Database that are in this category are:

  • Amazon Redshift: A fully managed, petabyte-scale data warehouse service in the cloud.
  • Snowflake: A cloud-based data platform that provides a data warehouse as a service designed for the cloud.
  • BigQuery: A serverless, highly scalable, and cost-effective cloud data warehouse designed for business agility.
  • DuckDB: An embeddable SQL OLAP database that can be used with Python and many other languages.
  • ClickHouse: An open-source OLAP database management system developed by Yandex. It is column-oriented and allows to generate analytical reports using SQL queries in real-time.

What is OLTP?

OLTP (Online transaction processing) databases are optimized for managing transaction-oriented applications, like e-commerce systems, banking, and reservation systems.

Online transaction processing (OLTP) is a type of database system used in transaction-oriented applications, such as many operational systems. "Online" refers to that such systems are expected to respond to user requests and process them in real-time (process transactions) The term is contrasted with online analytical processing ( OLAP) which instead focuses on data analysis.

-- Automatic Summary from Wikipedia using this script

An example of an OLTP database is a banking system that stores customer information and transactions. It allows users to perform transactions like deposits, withdrawals, and transfers.

Database that are in this category are:

  • MySQL : An open-source relational database management system.
  • MariaDB: An open-source relational database management system, and a fork of MySQL before being acquired by Oracle.
  • PostgreSQL: An open-source object-relational database management system.
  • Oracle: A relational database management system developed by Oracle Corporation.
  • SQL Server: A relational database management system developed by Microsoft.
  • SQLite: A relational database management system that is embedded into the end program.

What is Embeddable?

Embeddable databases can be directly integrated into an application's software. They run within the application’s process, making them ideal for desktop applications, small-scale web applications, or for use in portable devices.

Database that are in this category are:

  • SQLite: A relational database management system that is embedded into the end program.
  • DuckDB: An embeddable SQL OLAP database that can be used with Python and many other languages.
  • H2: An embeddable RDBMS written in Java.
  • Firebird: An open-source SQL relational database management system.
  • HSQLDB: An open-source relational database management system written in Java.

What is Standalone?

Standalone databases operate separately from the application. They require their server process and are accessed over a network. This type is suitable for large-scale applications that require robust database management systems.

Database that are in this category are:

  • MySQL : An open-source relational database management system.
  • MariaDB: An open-source relational database management system, and a fork of MySQL before being acquired by Oracle.
  • PostgreSQL: An open-source object-relational database management system.

Understanding the differences

OLAP vs OLTP

OLAP and OLTP are two different approaches to data processing. OLAP is used for complex queries and data analysis, while OLTP is used for transaction-oriented applications.

Transaction-oriented means that the database is optimized for handling transactions, which are a set of operations that must be executed as a whole. For example, when you buy something online, the transaction is the whole process of adding the item to the cart, entering your payment information, and confirming the purchase.

OLAP databases are optimized for complex queries and data analysis. They are used in business intelligence and data warehousing, allowing users to analyze data from multiple perspectives.

Embeddable vs Standalone

Embeddable databases can be directly integrated into an application's software. They run within the application’s process, making them ideal for desktop applications, small-scale web applications, or for use in portable devices. But they also have a sweet spot in data science and machine learning applications, where they can be used to store data without the need for a separate database server.

Standalone databases operate separately from the application. They require their server process and are accessed over a network. This type is suitable for large-scale applications that require robust database management systems. In data science and machine learning applications, they are used for storing large amounts of data, and you can also run them using Docker, if you want to avoid the whole setup process.

With embeddable databases, you can store data in a single file, which is easier to manage and move around, and you just need to install the database driver to use it. With standalone databases, you need to install the database server, the client and everything in between.

Conclusion

Understanding these basic database concepts is crucial but not enough for selecting the right database type for your specific needs. Each type offers unique advantages depending on the application's requirements and your use case.

References

Comments

Comments powered by Disqus