Skip to main content

Delete Commit History

remove node_modules from git history​

git rm -r --cached .
git add .
git commit -m "remove gitignore files"
git push

orphan branch

git checkout --orphan <branch_name>

Reduce the size of the .git folder​

CommandResult
attempt 2git reflog expire --all --expire=now
git gc --prune=now --aggressive
Fail 1git repack -a -d -f --depth=250 --window=250
git remote prune origin
git repack
git prune-packed
git reflog expire --expire=1.month.ago
git gc --aggressive

In Git, the "git repack" command is used to combine objects within a repository's pack files to optimize storage and improve performance. When you create new commits, Git stores them as individual objects, such as blobs (file contents) and trees (directory structures). Over time, these objects can accumulate and take up disk space.

Git's packing mechanism helps reduce the size of a repository by compressing and combining these individual objects into pack files. A pack file contains a collection of objects that are stored together, resulting in more efficient storage and faster data transfer.

The "git repack" command allows you to manually trigger the repacking process. By default, Git automatically performs this process when necessary, but you can use "git repack" to exert more control or optimize the repository manually.

Here are a few common use cases for "git repack":

  1. Combining objects: When you have multiple pack files in your repository, running "git repack" without any additional options combines them into a single pack file. This consolidation can improve the performance of some Git operations.

  2. Reinflating packed objects: Git packs objects to reduce disk space. However, sometimes it can be useful to reinflate objects, which means expanding them back into their individual files. The "--unpack-unreachable" option in "git repack" can be used to unpack and reinflate all objects, including those that are unreachable from any branch or tag.

  3. Adjusting pack settings: The "git repack" command provides various options to tweak the packing process. For example, you can control the compression level, specify a specific pack algorithm, remove redundant objects, or remove unreachable objects.

It's important to note that "git repack" is a relatively low-level command, and in most cases, you won't need to use it directly. Git's automatic packing mechanism is typically sufficient for managing repositories efficiently.

git gc vs git prune​

git gcĀ is a parent command andĀ git pruneĀ is a child.Ā git gcĀ will internally triggerĀ git prune.Ā git pruneĀ is used to remove Git objects that have been deemed inaccessible by theĀ git gcĀ configuration. Learn more aboutĀ git prune.

In Git, the process of garbage collection is responsible for cleaning up unnecessary objects and optimizing the storage of a repository. Git's garbage collection mechanism ensures that objects that are no longer reachable from any branch or tag are removed from the repository, freeing up disk space and improving performance.

Here's a high-level overview of how Git garbage collection works:

  1. Identifying unreferenced objects: Git starts by identifying objects that are no longer reachable. It begins with known references such as branches, tags, and other pointers, and traverses the commit graph to find all reachable objects. Any objects that are not encountered during this traversal are considered unreferenced and eligible for removal.

  2. Marking reachable objects: Git marks all reachable objects as "in-use" during the traversal process. This marking ensures that they are not accidentally deleted.

  3. Deleting unreferenced objects: Once the reachable objects are marked, Git proceeds to delete the unreferenced objects. This cleanup process permanently removes the objects from the repository, freeing up disk space.

  4. Packing objects: After removing the unreferenced objects, Git performs a packing operation to further optimize the storage. It creates pack files that contain compressed and efficiently stored objects. This packing process reduces the overall size of the repository and speeds up data transfer.

  5. Reflog expiration: Git also cleans up expired entries in the reflog during garbage collection. The reflog records the history of branch and HEAD movements, and over time, it can accumulate unnecessary entries. Garbage collection removes expired reflog entries, further optimizing storage.

The garbage collection process in Git is typically triggered automatically when certain conditions are met, such as when you run certain commands like "git commit" or "git merge." Git also performs garbage collection periodically in the background to ensure the repository remains efficient.

However, you can manually trigger garbage collection using the "git gc" command. This command allows you to control various aspects of garbage collection, such as specifying the aggressiveness of the cleanup or forcing a collection even if the repository size is below the threshold.

It's worth noting that for most Git users, manual garbage collection is rarely necessary, as Git's automatic garbage collection mechanisms generally handle the cleanup and optimization of repositories effectively.

In Git, the reflog (reference log) is a mechanism that records the history of changes to local branch references (such as HEAD, branches, and tags). It serves as a safety net, allowing you to recover lost commits or branches, undo unintended changes, and navigate through the recent history of reference updates within your local repository.

Here are a few key points about the reflog:

  1. Recording reference updates: Whenever a reference, such as a branch or HEAD, changes in Git, the reflog captures that update. This includes actions such as creating a new branch, checking out a different commit, rebasing, merging, or amending commits. Each reference has its own reflog.

  2. Storing reference states: The reflog records the commit hash (SHA-1) that the reference pointed to before the update and the commit hash it points to after the update. It also includes a timestamp and information about the action that triggered the update.

  3. Local to each repository: The reflog is specific to each Git repository and is not shared with remote repositories during push or fetch operations. Each clone of a repository maintains its own reflog.

  4. Limited lifespan: The reflog has a limited lifespan and does not retain the entire history of reference updates indefinitely. By default, it retains entries for 90 days. After that period, older entries are automatically pruned during Git's garbage collection process.

  5. Recovery and navigation: The reflog provides a way to navigate through the recent history of reference updates and recover lost commits or branches. You can use the "git reflog" command to view the reflog entries for a specific reference or the entire repository. The output displays a chronological list of reference updates, allowing you to identify the commit hash associated with a previous state of a reference.

  6. Restoring lost commits or branches: If you accidentally delete a branch, reset a branch to an unintended state, or lose commits due to an operation, you can use the reflog to recover the lost state. By identifying the commit hash in the reflog, you can recreate a branch or reset a branch to its previous state.

It's important to note that the reflog is a local mechanism and should not be relied upon as a long-term backup solution. It primarily serves as a safety net within a single Git repository, helping you recover from recent changes or mistakes. For more robust backup and collaboration, it's recommended to use remote repositories or other version control systems.