Welcome to the GitHub repository of Thomas F McGeehan V, a seasoned Data Technology Architect with a rich portfolio spanning over two decades in the field of data engineering and analytics. I hold nearly a dozen patents and have led several high-impact projects across various industries, demonstrating a consistent commitment to excellence and innovation.
Expertise | Description | |
---|---|---|
ποΈ | Data Architecture & Engineering | Designing and implementing resilient, performant, and scalable data platforms that cover all phases of data lifecycle management, from ingestion and integration to consumption and analytics. |
π€ | Machine Learning & AI | Democratizing machine learning applications, making advanced analytics accessible in innovative ways. |
βοΈ | Cloud Solutions | Extensive experience with major cloud platforms, including both public clouds and on-premise solutions. |
Project | Description | |
---|---|---|
βοΈ | BigQuery BigFunctions | As an active contributor to the open-source BigFunctions project, I help develop advanced SQL functions that extend the capabilities of Google BigQuery, enabling more efficient and powerful data transformations and analyses. |
π | AddressMatchPro | A Go solution for approximate entity matching, focusing on standardizing street addresses in the USA. This project utilizes advanced algorithms to ensure high accuracy and efficiency in entity resolution tasks. |
π§ | PromptTriad | An innovative Go API hosted on Cloud Run that leverages three competing AI models (OpenAI, Gemini, and Cohere) to collaboratively engineer and optimize the best possible prompt from any given input. The project focuses on integrating these APIs, implementing response evaluation using cosine similarity, and providing robust logging and monitoring. |
π | GCS2Postgres | A Go-based solution designed to load various open data formats stored in Google Cloud Storage (GCS) and BigQuery into a PostgreSQL database. It supports multiple file formats, utilizes BigQuery for data processing, and ensures secure PostgreSQL credentials retrieval from Google Secret Manager. |
π | LinguisticLens | An API for analyzing text using OpenAI's language model, focusing on emotional, factual, and implicit aspects. It identifies and explores dark triad traits, hidden meanings, and tonal nuances to provide a comprehensive text analysis. Built using the Gin framework. |
π€ | BQ Multi Agent | A platform leveraging multiple AI agents to interact with Google BigQuery for enhanced data analytics. This project aims to optimize query performance and provide insightful data analytics through a multi-agent architecture. |
πΉ | ArrowLake | A data lakehouse architecture integrating Apache Arrow and Iceberg to optimize large-scale data processing, analytics, and real-time streaming. This project includes vector database integration using pgvector for GenAI, LLMs, and transformer architectures, and leverages Storj for decentralized, secure, and scalable cloud storage. |
π©οΈ | Flight | A Go implementation of the Apache Arrow Flight SQL protocol. This project enables efficient, high-performance data transport using Arrow Flight, facilitating interoperability and enhancing the data processing capabilities of modern data systems. |
Value | Description | |
---|---|---|
π‘ | Innovation | Continuously pushing the boundaries of technology to create solutions that not only meet current needs but also foresee and address future challenges. |
π | Leadership | Building and nurturing teams that are not only technically proficient but also innovative and forward-thinking. |
β | Excellence | Consistently striving to exceed expectations through high-quality work and persistent dedication to improving and evolving in all aspects of technology and leadership. |
Connect with me to discuss potential collaborations, or if youβre looking for guidance or mentorship in data technology and architecture:
- My cat Lux's website: www.luxstl.com
- My LLC website: www.velocedata.io
- Leet Code Profile: www.leetcode.com/u/tfmv