Cric Stats Analyzer 🏏

A repository that holds codes, schemas and sql queries for Cricket Statistics Analyzer.

Getting Started with Cric Stats Analyzer

Overview

Cric Stats Analyzer is a comprehensive tool designed to analyze historical cricket matches played in the ODI, T20, and Test formats (Mens). The objective is to load match data from Cricsheet, perform data transformations, and store it efficiently in BigQuery. The project aims to provide statistical reports, metrics, and dashboards, emphasizing the top 50 batsmen, bowlers, and all-rounders of all time. The solution prioritizes cost-efficiency and optimal query performance.

Project Structure

The project is organized into three main components:

Cloud Function (to_bq.py):
- Orchestrates the ETL process.
- .env: Environment variable configuration.
- Read README.md: Documentation providing an overview of the Cloud Function.
Schema:
- tables: Schema definitions for BigQuery tables.
- views: Schema definitions for virtual views.
SQL Scripts:
- DDL: Data Definition Language (DDL) scripts for view creation.
- DML: Data Manipulation Language (DML) scripts for data manipulation.

Setup Steps

Cloud Function (to_bq.py):
- Clone the repository to your local machine.
- Review and update environment variables in the .env file.
- Ensure required Python libraries are installed using pip install -r requirements.txt.
Schema (schema folder):
- Explore schema files in the tables and views directories for detailed structure information.
- Make any necessary updates to the schema based on project requirements.
SQL Scripts (sql folder):
- Review DDL scripts for view creation.
- Examine DML scripts for data manipulation.
- Execute scripts in the desired order for setting up the BigQuery environment.

Functionality

Load Data into BigQuery Tables:
- Ingest historical cricket match data into separate BigQuery tables based on match types (ODI, T20, Test).
Source Data from Match Files:
- Retrieve data from Match files stored in a Google Cloud Storage (GCS) bucket.
Statistical Reports, Metrics, Dashboards:
- Generate statistical reports, metrics, and dashboards over BigQuery tables.
- Focus on creating specific reports for the top 50 batsmen, bowlers, and all-rounders of all time.
Cost-Efficient Solution:
- Develop a cost-efficient solution for data processing and storage.
Optimal Query Performance:
- Ensure optimal performance for queries over the BigQuery tables.

Architecture

Data Ingestion:
- A new data file is dropped into the GCS bucket.
Trigger Mechanism:
- Cloud Scheduler triggers the Cloud Function periodically to check for files in the GCS bucket.
Data Processing Workflow:
- The data processing workflow parses YAML (or CSV, XML, JSON) data.
- Performs data transformations.
- Loads the transformed data into BigQuery tables.
Archive Folder:
- On successful data ingestion, files in GCS bucket are moved to the archive folder.
Generating Views:
- Views are generated using BigQuery.
Dashboards in Looker:
- Projected to Dashboards in Looker for visualization and analysis.

Services Used

Google Cloud Storage (GCS):
- Storage service for holding historical cricket match data in various formats.
Cloud Scheduler:
- Periodically triggers the Cloud Function to check for new data files in the GCS bucket.
Cloud Function:
- Executes the ETL process, parsing, transforming, and loading data into BigQuery.
BigQuery:
- Data warehouse for storing cricket match statistics and enabling powerful querying.
Looker:
- Platform for building dashboards and visualizing insights from the generated views in BigQuery.
GCS Secret Manager:
- Accesses secrets from Google Cloud Secret Manager.

By following these setup steps, you can initiate and configure the Cric Stats Analyzer project for efficient cricket statistics analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
cloud_function		cloud_function
schema		schema
sql		sql
.gitignore		.gitignore
README.md		README.md
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cric Stats Analyzer 🏏

Getting Started with Cric Stats Analyzer

Overview

Project Structure

Setup Steps

Functionality

Architecture

Services Used

About

Releases

Packages

Languages

zain-muiz-sada/cric_stats_analyzer

Folders and files

Latest commit

History

Repository files navigation

Cric Stats Analyzer 🏏

Getting Started with Cric Stats Analyzer

Overview

Project Structure

Setup Steps

Functionality

Architecture

Services Used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages