Databases

How to choose

Fake Data for Development / Testing

Database-first design

Database-first design is an approach to software development where the database schema is designed and implemented before the application code. In this approach, the database structure and relationships between tables are established first, and then the application is developed based on the existing database schema.

The process typically involves the following steps:

Database Design: The database designer analyzes the requirements of the system and designs the database schema accordingly. This includes identifying entities, defining tables, establishing relationships, and setting up constraints such as primary keys, foreign keys, and indexes.
Database Implementation: Once the database design is finalized, the database administrator or developer implements the schema using a database management system (DBMS) such as MySQL, Oracle, or Microsoft SQL Server. Tables are created, relationships are established, and any necessary constraints are applied.
Application Development: With the database schema in place, the application developers can start building the software that interacts with the database. They write code to connect to the database, retrieve data, perform CRUD operations (create, read, update, delete), and handle business logic.

Advantages of Database-First Design:

Data Integrity: By designing the database schema first, you can ensure that the structure of the data is well-defined and follows appropriate normalization techniques. This helps maintain data integrity and reduces the risk of data inconsistencies.
Efficient Querying: A well-designed database schema can optimize data retrieval and querying operations. By understanding the database structure upfront, developers can leverage the power of relational databases to write efficient SQL queries.
Separation of Concerns: Database-first design promotes the separation of concerns between data storage and application logic. The database schema serves as a stable foundation that can be accessed and modified by multiple applications or services.
Team Collaboration: Database-first design facilitates collaboration between developers and database administrators. The database schema acts as a common reference point and allows both teams to work in parallel, focusing on their respective areas of expertise.

However, it's important to note that database-first design may not be suitable for all projects or development scenarios. Alternative approaches like code-first or model-first design can also be used depending on the specific requirements and preferences of the development team.

Non-Relational SQL

skills don't transfer between them

MongoDB
Redis
Cassandra
Neo4J

Azure Cosmos DB

NoSQL

Performance
Scalability

NewSQL

Google Spanner

Relational

Once you learn one, skills transfer to others

Oracle Database
MS SQL Server
IBM DB2
PostgreSQL
MySQL

How to choose

The database structure basically is all about how you choose to store and retrieve the data at the time of the need. Choosing the right database depends on finding the appropriate structure to deal with the type of application data. Often by selecting the database with appropriate structure, you can reduce the development efforts to a minimum.
The database size refers to the capacity of the database for storage and retrieval. A database with advanced partition system can easily store data across multiple systems and servers.
Database speed and scalability refers to the time taken by the database in reading and writing the data. While there are databases characterized to serve read-heavy apps better, there are several others that are appropriate for write-heavy apps.

Database Types	Description
Relational database model	This model classifies data through tables, and each table consists of rows. Data keys identify these tables and their constituent rows. These keys help relating multiple tables of data and help to access data as per the contexts or requirements.
Document-based	This database model basically stores data in a structure made up of fields with simple or complex values. In such a database, each document can be represented through different fields. Such database by supporting complex hierarchies of documents allows embedding them inside the apps. This type of data model is not limited to fixed schemas are preferred by developers for flexibility.
KeyValue-based	Key/Value-based database systems are known for simplicity and easier accessibility. As per this system, data values are stored with unique keys.
Column-based	Column-based databases work much like the relational database systems and keep data stored in rows and columns. The columns of data can have multiple values. The data can be fetched either by row or column as per the required type of data.
Graph	Graph-based database model actually stores data entities governed by their internal relationships. For easy navigation across various data entities and retrieving data through the relationships, this storage type comes with great help.

Comparison

MYSQL/MARIADB

Well established, stable and reliable
Lots of libraries, frameworks and tools to leverage
Data is rigid, well structured and defined, leading to less data mismatch bugs
Scaling requires a lot of domain-specific knowledge

MONGODB

No tables, no formal schema, non-relational
Easier to scale than SQL-based databases
Easy ramp up and iteration of database model
Easier to shoot yourself in the foot

Table of Contents ⬆️

what is a GUID

GUID stands for "Globally Unique Identifier." In software engineering, a GUID is a unique identifier that is used to identify a particular object or entity in a system. A GUID is typically a 128-bit value that is generated using an algorithm.

GUIDs are useful in situations where it is important to ensure that two objects or entities in a system do not have the same identifier. This could occur, for example, if two users create accounts on a website at the same time and both are assigned the same ID number. By using GUIDs instead of sequential or random numbers, the probability of a collision is extremely low, making them ideal for situations where uniqueness is critical.

GUIDs are often used in database systems, where they can be used as primary keys for tables. They are also used in distributed systems, where multiple servers need to communicate with each other and keep track of unique entities.

Table of Contents ⬆️

data source | }

CASE Expressions

CAST
CONNECT BY
DATETIME
DATETIME CONVERSION
EXTRACT

Hierarchal Queries

Best Practices

there is a user called dbo in every database
there is a schema called dbo in every db

ACID

Atomicity
Consistently
Isolation
Durability

Business Logic

If you allow an environment where individual developers can put business logic in the business object layer or in a stored procedure as they see fit, a large application becomes VERY difficult to understand and maintain.
Stored procedures are great for speeding up certain DB operations. My architectural decision is to leave all logic in the business layer of the application and employ stored procedures in a targeted manner to improve performance where benchmarking indicates it is warranted.

Stored Procedures vs User Defined Functions

SP

independently called with EXEC statement
cannot JOIN SPs
can modify SQL Server configuration
can use nondeterministic functions such as GETDATE()

UDF (User Defined Functions)

3 Broad Categories

Database Changing Procedures

series of (mostly INSERT, UPDATE, DELETE) operations that perform a unitary business task

Computation-only functions

if-else logic operations

Lookup Functions

execute queries against the database
routines that take 0+ parameters, perform an operation and return the result
scalar
table-valued
System
called from within another SQL statement
can JOIN UDFs
can't modify SQL Server configuration
Always stops execution of T-SQL code if error occurs

Security

entity IDs should be randomly generated UUIDs, not sequential. This helps secure resources by making IDs much harder to guess.

Indexes

Indexes Indexes are special lookup tables that the database search engine can use to speed up data retrieval.

Removing an Index

Creating an Index

SELECT vs UPDATE performance speed based on index

Query Plans

Query plans, also known as execution plans, are a fundamental component of database systems. They are generated by the database optimizer and provide a detailed outline of how a database system will execute a specific query or command to retrieve or modify data. Query plans are essential for understanding how the database engine will access and manipulate data to fulfill the query requirements efficiently. Here are some key aspects of query plans:

Logical and Physical Operations: A query plan describes the sequence of logical and physical operations that the database engine will perform to execute a query. Logical operations, such as filtering, joining, sorting, or grouping, represent the high-level steps required to process the query. Physical operations detail how those logical operations will be implemented, such as using specific indexes, scan methods, or join algorithms.
Access Methods: Query plans outline the access methods the database engine will employ to retrieve or modify data. This includes determining whether to use indexes, table scans, or a combination of both to access the required data efficiently. The plan may indicate the specific indexes or table partitioning schemes that will be utilized.
Join Strategies: For queries involving joins between multiple tables, query plans indicate the join strategies selected by the database engine. This includes choosing between nested loop joins, merge joins, or hash joins based on factors such as table sizes, available indexes, and join conditions.
Data Transformations: Query plans may include data transformations that are necessary to perform aggregations, projections, or other operations on the retrieved data. These transformations can involve sorting, grouping, or applying functions to the data before presenting the final result set.
Cost Estimations: Query plans often include cost estimations associated with different plan alternatives. The optimizer evaluates various plan options and estimates their relative costs, such as CPU and I/O operations. This helps the optimizer choose the most efficient plan based on cost-based optimization principles.
Parallel Execution: In parallel database systems, query plans may indicate the use of parallelism to execute the query across multiple CPU cores or nodes. This allows for concurrent processing of data, which can significantly improve query performance for large datasets.

Query plans provide valuable insights into how a database system executes a query and can help identify potential performance bottlenecks or areas for optimization. Database administrators, developers, and performance analysts often analyze query plans to understand how queries are processed and optimize their database schema, indexing strategies, or query formulations for better performance.

Data Centric
Logic Centric
Actual Plan
Estimated Plan

Table of Contents​

Fake Data for Development / Testing​

Database-first design​

Non-Relational SQL​

skills don't transfer between them​

Azure Cosmos DB​

NoSQL​

NewSQL​

Relational​

Once you learn one, skills transfer to others​

How to choose​

Comparison​

MYSQL/MARIADB​

MONGODB​

what is a GUID​

CASE Expressions​

Hierarchal Queries​

Best Practices​

ACID​

Business Logic​

Stored Procedures vs User Defined Functions​

SP​

UDF (User Defined Functions)​

Database Changing Procedures​

Computation-only functions​

Lookup Functions​

Security​

Indexes​

Removing an Index​

Creating an Index​

Query Plans​

Table of Contents