Databases
Table of Contents
Fake Data for Development / Testing
Database-first design
Database-first design is an approach to software development where the database schema is designed and implemented before the application code. In this approach, the database structure and relationships between tables are established first, and then the application is developed based on the existing database schema.
The process typically involves the following steps:
-
Database Design: The database designer analyzes the requirements of the system and designs the database schema accordingly. This includes identifying entities, defining tables, establishing relationships, and setting up constraints such as primary keys, foreign keys, and indexes.
-
Database Implementation: Once the database design is finalized, the database administrator or developer implements the schema using a database management system (DBMS) such as MySQL, Oracle, or Microsoft SQL Server. Tables are created, relationships are established, and any necessary constraints are applied.
-
Application Development: With the database schema in place, the application developers can start building the software that interacts with the database. They write code to connect to the database, retrieve data, perform CRUD operations (create, read, update, delete), and handle business logic.
Advantages of Database-First Design:
-
Data Integrity: By designing the database schema first, you can ensure that the structure of the data is well-defined and follows appropriate normalization techniques. This helps maintain data integrity and reduces the risk of data inconsistencies.
-
Efficient Querying: A well-designed database schema can optimize data retrieval and querying operations. By understanding the database structure upfront, developers can leverage the power of relational databases to write efficient SQL queries.
-
Separation of Concerns: Database-first design promotes the separation of concerns between data storage and application logic. The database schema serves as a stable foundation that can be accessed and modified by multiple applications or services.
-
Team Collaboration: Database-first design facilitates collaboration between developers and database administrators. The database schema acts as a common reference point and allows both teams to work in parallel, focusing on their respective areas of expertise.
However, it's important to note that database-first design may not be suitable for all projects or development scenarios. Alternative approaches like code-first or model-first design can also be used depending on the specific requirements and preferences of the development team.
Non-Relational SQL
skills don't transfer between them
- MongoDB
- Redis
- Cassandra
- Neo4J
Azure Cosmos DB
NoSQL
- Performance
- Scalability
NewSQL
- Google Spanner
Relational
Once you learn one, skills transfer to others
- Oracle Database
- MS SQL Server
- IBM DB2
- PostgreSQL
- MySQL
How to choose
- The database structure basically is all about how you choose to store and retrieve the data at the time of the need. Choosing the right database depends on finding the appropriate structure to deal with the type of application data. Often by selecting the database with appropriate structure, you can reduce the development efforts to a minimum.
- The database size refers to the capacity of the database for storage and retrieval. A database with advanced partition system can easily store data across multiple systems and servers.
- Database speed and scalability refers to the time taken by the database in reading and writing the data. While there are databases characterized to serve read-heavy apps better, there are several others that are appropriate for write-heavy apps.
| Database Types | Description |
|---|---|
| Relational database model | This model classifies data through tables, and each table consists of rows. Data keys identify these tables and their constituent rows. These keys help relating multiple tables of data and help to access data as per the contexts or requirements. |
| Document-based | This database model basically stores data in a structure made up of fields with simple or complex values. In such a database, each document can be represented through different fields. Such database by supporting complex hierarchies of documents allows embedding them inside the apps. This type of data model is not limited to fixed schemas are preferred by developers for flexibility. |
| KeyValue-based | Key/Value-based database systems are known for simplicity and easier accessibility. As per this system, data values are stored with unique keys. |
| Column-based | Column-based databases work much like the relational database systems and keep data stored in rows and columns. The columns of data can have multiple values. The data can be fetched either by row or column as per the required type of data. |
| Graph | Graph-based database model actually stores data entities governed by their internal relationships. For easy navigation across various data entities and retrieving data through the relationships, this storage type comes with great help. |
Comparison
MYSQL/MARIADB
- Well established, stable and reliable
- Lots of libraries, frameworks and tools to leverage
- Data is rigid, well structured and defined, leading to less data mismatch bugs
- Scaling requires a lot of domain-specific knowledge
MONGODB
- No tables, no formal schema, non-relational
- Easier to scale than SQL-based databases
- Easy ramp up and iteration of database model
- Easier to shoot yourself in the foot
what is a GUID
GUID stands for "Globally Unique Identifier." In software engineering, a GUID is a unique identifier that is used to identify a particular object or entity in a system. A GUID is typically a 128-bit value that is generated using an algorithm.
GUIDs are useful in situations where it is important to ensure that two objects or entities in a system do not have the same identifier. This could occur, for example, if two users create accounts on a website at the same time and both are assigned the same ID number. By using GUIDs instead of sequential or random numbers, the probability of a collision is extremely low, making them ideal for situations where uniqueness is critical.
GUIDs are often used in database systems, where they can be used as primary keys for tables. They are also used in distributed systems, where multiple servers need to communicate with each other and keep track of unique entities.
data source |
}
CASE Expressions
- CAST
- CONNECT BY
- DATETIME
- DATETIME CONVERSION
- EXTRACT
Hierarchal Queries
Best Practices
- there is a user called dbo in every database
- there is a schema called dbo in every db
ACID
- Atomicity
- Consistently
- Isolation
- Durability
Business Logic
-
If you allow an environment where individual developers can put business logic in the business object layer or in a stored procedure as they see fit, a large application becomes VERY difficult to understand and maintain.
-
Stored procedures are great for speeding up certain DB operations. My architectural decision is to leave all logic in the business layer of the application and employ stored procedures in a targeted manner to improve performance where benchmarking indicates it is warranted.
Stored Procedures vs User Defined Functions
SP
- independently called with EXEC statement
- cannot JOIN SPs
- can modify SQL Server configuration
- can use nondeterministic functions such as GETDATE()
UDF (User Defined Functions)
- 3 Broad Categories
Database Changing Procedures
- series of (mostly INSERT, UPDATE, DELETE) operations that perform a unitary business task
Computation-only functions
- if-else logic operations
Lookup Functions
- execute queries against the database
- routines that take 0+ parameters, perform an operation and return the result
- scalar
- table-valued
- System
- called from within another SQL statement
- can JOIN UDFs
- can't modify SQL Server configuration
- Always stops execution of T-SQL code if error occurs
Security
- entity IDs should be randomly generated UUIDs, not sequential. This helps secure resources by making IDs much harder to guess.
Indexes
Indexes Indexes are special lookup tables that the database search engine can use to speed up data retrieval.
Removing an Index
Creating an Index
SELECT vs UPDATE performance speed based on index
Query Plans
Query plans, also known as execution plans, are a fundamental component of database systems. They are generated by the database optimizer and provide a detailed outline of how a database system will execute a specific query or command to retrieve or modify data. Query plans are essential for understanding how the database engine will access and manipulate data to fulfill the query requirements efficiently. Here are some key aspects of query plans:
-
Logical and Physical Operations: A query plan describes the sequence of logical and physical operations that the database engine will perform to execute a query. Logical operations, such as filtering, joining, sorting, or grouping, represent the high-level steps required to process the query. Physical operations detail how those logical operations will be implemented, such as using specific indexes, scan methods, or join algorithms.
-
Access Methods: Query plans outline the access methods the database engine will employ to retrieve or modify data. This includes determining whether to use indexes, table scans, or a combination of both to access the required data efficiently. The plan may indicate the specific indexes or table partitioning schemes that will be utilized.
-
Join Strategies: For queries involving joins between multiple tables, query plans indicate the join strategies selected by the database engine. This includes choosing between nested loop joins, merge joins, or hash joins based on factors such as table sizes, available indexes, and join conditions.
-
Data Transformations: Query plans may include data transformations that are necessary to perform aggregations, projections, or other operations on the retrieved data. These transformations can involve sorting, grouping, or applying functions to the data before presenting the final result set.
-
Cost Estimations: Query plans often include cost estimations associated with different plan alternatives. The optimizer evaluates various plan options and estimates their relative costs, such as CPU and I/O operations. This helps the optimizer choose the most efficient plan based on cost-based optimization principles.
-
Parallel Execution: In parallel database systems, query plans may indicate the use of parallelism to execute the query across multiple CPU cores or nodes. This allows for concurrent processing of data, which can significantly improve query performance for large datasets.
Query plans provide valuable insights into how a database system executes a query and can help identify potential performance bottlenecks or areas for optimization. Database administrators, developers, and performance analysts often analyze query plans to understand how queries are processed and optimize their database schema, indexing strategies, or query formulations for better performance.
- Data Centric
- Logic Centric
- Actual Plan
- Estimated Plan