layout | title | nav_order | description | permalink | last_modified_date |
---|---|---|---|---|---|
default |
Home |
1 |
Working with git and GitHub in Data Science. |
/ |
2025-01-20 02:13AM |
{: .fs-9 }
Source Control Basics: How to set up, configure, and work with git
and GitHub.
{: .fs-6 .fw-300 }
Setting Up{: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } Basic Commands{: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } Advanced{: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } Actions{: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } Check{: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 }
Source control, also known as "version control," is the process of tracking changes and versions of electronic files over time. This might include code, data, images, and other files. Good source control management tools allow users to see the entire history of changes to any specific file, or the evolution of an entire project. git
is currently the most popular and feature-rich source control tool.
Git is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers who are collaboratively developing source code during software development. Its goals include speed, data integrity, and support for distributed, non-linear workflows. Wikipedia
GitHub is a developer platform that allows developers to create, store, manage and share their code. It uses Git software, providing the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project. It currently hosts work by approximately 100M developers. Wikipedia
Data aggregation, cleaning, pipelines and ML models all rely on software in order to operate. Responsible software management depends on well-managed code, versioning, prioritizing bugs, features, and user issues. Working at scale, modern platforms and infrastructure tend to require code-driven tests, builds, deployments, and management. Code can be used to define all the layers of effort across teams of engineers and data scientists.
Which is to say: Code is fundamental to our work, and it would be risky, inefficient, and impractical not to use source control.
- Setup
- Install and set up
git
- Authenticate
git
to GitHub - Basic configuration
- Troubleshooting authentication
- Install and set up
- Creating and managing a repository
- Create a repository locally
- Create a repository in GitHub
- Add or remove collaborators
- Source control basics
- Diff
- Status
- Add
- Commit
- Push/Pull
- Fetch
- Log
- Branches, Forks, and Merges
- Branches
- Forks
- Fetch from Upstream
- Merges and Pull Requests
- Issues
- Advanced Git/GitHub Features
- Stash
- Signing commits
- Reset and Revert
- Rebase
- Cherry-pick
- Renaming
origin
- Bonus
- GitHub Actions
- About
- Credentials & Secrets
- Example 1 - Build software upon a push
- Example 2 - Build and deploy a container
- Skills Check