Skip to content

Commit

Permalink
proxy added and enhance gnewsdecoder functionality
Browse files Browse the repository at this point in the history
  • Loading branch information
SSujitX committed Jan 18, 2025
1 parent 836bac1 commit d7bc07d
Show file tree
Hide file tree
Showing 11 changed files with 383 additions and 95 deletions.
54 changes: 28 additions & 26 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -1,39 +1,41 @@
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries

# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.

name: Upload Python Package

on:
push:
tags:
- "[0-9]+.[0-9]+.[0-9]+"
- "[0-9]+.[0-9]+.[0-9]+a[0-9]+"
- "[0-9]+.[0-9]+.[0-9]+b[0-9]+"
- "[0-9]+.[0-9]+.[0-9]+rc[0-9]+"
branches:
- main
release:
types: [published]

permissions:
contents: read

jobs:
deploy:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Build package
run: python -m build
- name: Publish package
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: "3.12"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Build package
run: python -m build

- name: Publish package
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
6 changes: 2 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
# Ignore Python bytecode
__pycache__/
*.pyc
*.pyo

# Ignore macOS system files
.DS_Store
.venv
.ignore
.backup_readme.md
.backup_readme2.md

# Ignore build artifacts
dist/
build/

# Ignore egg info
*.egg-info/

test.py
121 changes: 78 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,29 @@
[![PyPI version](https://badge.fury.io/py/googlenewsdecoder.svg)](https://badge.fury.io/py/googlenewsdecoder)
[![Python Versions](https://img.shields.io/badge/python-3.9-blue)](https://pypi.org/project/facebook-pages-scraper/)
[![Python Versions](https://img.shields.io/badge/python-3.9%20|%203.10%20|%203.11%20|%203.12%20|%203.13-blue)](https://pypi.org/project/googlenewsdecoder/)
[![Downloads](https://static.pepy.tech/badge/googlenewsdecoder)](https://pepy.tech/project/googlenewsdecoder)
[![Downloads](https://static.pepy.tech/badge/googlenewsdecoder/month)](https://pepy.tech/project/googlenewsdecoder)
[![Downloads](https://static.pepy.tech/badge/googlenewsdecoder/week)](https://pepy.tech/project/googlenewsdecoder)

# Google News Decoder

Google News Decoder is a Python package that can decode Google News links or Google News URLs to their original URLs. It is a simple tool that saves you time and effort. If you find it useful, please support the package by hitting the star on GitHub. Your support helps keep the project going!

[Pypi Package](https://pypi.org/project/googlenewsdecoder/)

## Update

- Version 0.1.6:

- Improved: Enhanced error handling with a fallback mechanism for decoding parameters.
- Refined: Optimized get_decoding_params to try decoding via https://news.google.com/articles first, falling back to https://news.google.com/rss/articles if needed
- Updated: Reduced occurrences of HTTP 429 (Too Many Requests).
- Removed: Logging functionality for a cleaner codebase.
- Fixed: Resolved time delay issue between requests.
- **Version 0.1.7**:
- **New Feature**: Added **proxy support** to handle rate limiting and bypass restrictions.
- **Improved**: Enhanced error handling with a fallback mechanism for decoding parameters.
- **Refined**: Optimized `get_decoding_params` to try decoding via `https://news.google.com/articles` first, falling back to `https://news.google.com/rss/articles` if needed.
- **Updated**: Reduced occurrences of HTTP 429 (Too Many Requests).
- **Removed**: Logging functionality for a cleaner codebase.
- **Fixed**: Resolved time delay issue between requests.

## Demo

![Google News Decoder](https://github.com/user-attachments/assets/3a3c3279-1c54-4e19-96cb-6f22f889aa2a)

## Installation

- You can install this package using pip:
You can install this package using pip:

```sh
pip install googlenewsdecoder
Expand All @@ -38,31 +35,70 @@ pip install googlenewsdecoder
pip install googlenewsdecoder --upgrade
```

## Supported Proxy Formats

- **HTTP/HTTPS Proxy**:

- **With authentication**: `http://user:pass@host:port` or `https://user:pass@host:port`
- **Without authentication**: `http://host:port` or `https://host:port`

- **SOCKS5 Proxy**:

- **With authentication**: `socks5://user:pass@host:port`
- **Without authentication**: `socks5://host:port`

- **IP and Port Only**:
- **HTTP**: `http://127.0.0.1:8080`
- **SOCKS5**: `socks5://127.0.0.1:1080`

## Usage

Here is an example of how to use this package with different decoders:

### Using new_decoderv1
### Using gnewsdecoder

```python
from googlenewsdecoder import new_decoderv1
from googlenewsdecoder import gnewsdecoder

def main():

interval_time = 5 # default interval is 1 sec, if not specified
interval_time = 1 # interval is optional, default is None

source_url = "https://news.google.com/read/CBMi2AFBVV95cUxPd1ZCc1loODVVNHpnbFFTVHFkTG94eWh1NWhTeE9yT1RyNTRXMVV2S1VIUFM3ZlVkVjl6UHh3RkJ0bXdaTVRlcHBjMWFWTkhvZWVuM3pBMEtEdlllRDBveGdIUm9GUnJ4ajd1YWR5cWs3VFA5V2dsZnY1RDZhVDdORHRSSE9EalF2TndWdlh4bkJOWU5UMTdIV2RCc285Q2p3MFA4WnpodUNqN1RNREMwa3d5T2ZHS0JlX0MySGZLc01kWDNtUEkzemtkbWhTZXdQTmdfU1JJaXY?hl=en-US&gl=US&ceid=US%3Aen"

try:
decoded_url = new_decoderv1(source_url, interval=interval_time)
decoded_url = gnewsdecoder(source_url, interval=interval_time)

if decoded_url.get("status"):
print("Decoded URL:", decoded_url["decoded_url"])
else:
print("Error:", decoded_url["message"])
except Exception as e:
print(f"Error occurred: {e}")

# Output: decoded_urls - [{'status': True, 'decoded_url': 'https://healthdatamanagement.com/articles/empowering-the-quintuple-aim-embracing-an-essential-architecture/'}]
if __name__ == "__main__":
main()
```

### Using gnewsdecoder with proxy

```python
from googlenewsdecoder import gnewsdecoder

def main():
interval_time = 1 # interval is optional, default is None
proxy = "http://user:pass@localhost:8080" # proxy is optional, default is None

source_url = "https://news.google.com/read/CBMi2AFBVV95cUxPd1ZCc1loODVVNHpnbFFTVHFkTG94eWh1NWhTeE9yT1RyNTRXMVV2S1VIUFM3ZlVkVjl6UHh3RkJ0bXdaTVRlcHBjMWFWTkhvZWVuM3pBMEtEdlllRDBveGdIUm9GUnJ4ajd1YWR5cWs3VFA5V2dsZnY1RDZhVDdORHRSSE9EalF2TndWdlh4bkJOWU5UMTdIV2RCc285Q2p3MFA4WnpodUNqN1RNREMwa3d5T2ZHS0JlX0MySGZLc01kWDNtUEkzemtkbWhTZXdQTmdfU1JJaXY?hl=en-US&gl=US&ceid=US%3Aen"

try:
decoded_url = gnewsdecoder(source_url, interval=interval_time, proxy=proxy)

if decoded_url.get("status"):
print("Decoded URL:", decoded_url["decoded_url"])
else:
print("Error:", decoded_url["message"])
except Exception as e:
print(f"Error occurred: {e}")

if __name__ == "__main__":
main()
Expand All @@ -71,54 +107,53 @@ if __name__ == "__main__":
### Using a for loop to decode multiple URLs

```python
from googlenewsdecoder import new_decoderv1
from googlenewsdecoder import gnewsdecoder

def main():
interval_time = 1 # interval is optional, default is None

interval_time = 5 # default interval is None, if not specified

source_urls = ["https://news.google.com/read/CBMilgFBVV95cUxOM0JJaFRwV2dqRDk5dEFpWmF1cC1IVml5WmVtbHZBRXBjZHBfaUsyalRpa1I3a2lKM1ZnZUI4MHhPU2sydi1nX3JrYU0xWjhLaHNfU0N6cEhOYVE2TEptRnRoZGVTU3kzZGJNQzc2aDZqYjJOR0xleTdsemdRVnJGLTVYTEhzWGw4Z19lR3AwR0F1bXlyZ0HSAYwBQVVfeXFMTXlLRDRJUFN5WHg3ZTI0X1F4SjN6bmFIck1IaGxFVVZyOFQxdk1JT3JUbl91SEhsU0NpQzkzRFdHSEtjVGhJNzY4ZTl6eXhESUQ3XzdWVTBGOGgwSmlXaVRmU3BsQlhPVjV4VWxET3FQVzJNbm5CUDlUOHJUTExaME5YbjZCX1NqOU9Ta3U?hl=en-US&gl=US&ceid=US%3Aen","https://news.google.com/read/CBMiiAFBVV95cUxQOXZLdC1hSzFqQVVLWGJVZzlPaDYyNjdWTURScV9BbVp0SWhFNzZpSWZxSzdhc0tKbVlHMU13NmZVOFdidFFkajZPTm9SRnlZMWFRZ01CVHh0dXU0TjNVMUxZNk9Ibk5DV3hrYlRiZ20zYkIzSFhMQVVpcTFPc00xQjhhcGV1aXM00gF_QVVfeXFMTmtFQXMwMlY1el9WY0VRWEh5YkxXbHF0SjFLQVByNk1xS3hpdnBuUDVxOGZCQXl1QVFXaUVpbk5lUGgwRVVVT25tZlVUVWZqQzc4cm5MSVlfYmVlclFTOUFmTHF4eTlfemhTa2JKeG14bmNabENkSmZaeHB4WnZ5dw?hl=en-US&gl=US&ceid=US%3Aen"]
source_urls = [
"https://news.google.com/read/CBMilgFBVV95cUxOM0JJaFRwV2dqRDk5dEFpWmF1cC1IVml5WmVtbHZBRXBjZHBfaUsyalRpa1I3a2lKM1ZnZUI4MHhPU2sydi1nX3JrYU0xWjhLaHNfU0N6cEhOYVE2TEptRnRoZGVTU3kzZGJNQzc2aDZqYjJOR0xleTdsemdRVnJGLTVYTEhzWGw4Z19lR3AwR0F1bXlyZ0HSAYwBQVVfeXFMTXlLRDRJUFN5WHg3ZTI0X1F4SjN6bmFIck1IaGxFVVZyOFQxdk1JT3JUbl91SEhsU0NpQzkzRFdHSEtjVGhJNzY4ZTl6eXhESUQ3XzdWVTBGOGgwSmlXaVRmU3BsQlhPVjV4VWxET3FQVzJNbm5CUDlUOHJUTExaME5YbjZCX1NqOU9Ta3U?hl=en-US&gl=US&ceid=US%3Aen",
"https://news.google.com/read/CBMiiAFBVV95cUxQOXZLdC1hSzFqQVVLWGJVZzlPaDYyNjdWTURScV9BbVp0SWhFNzZpSWZxSzdhc0tKbVlHMU13NmZVOFdidFFkajZPTm9SRnlZMWFRZ01CVHh0dXU0TjNVMUxZNk9Ibk5DV3hrYlRiZ20zYkIzSFhMQVVpcTFPc00xQjhhcGV1aXM00gF_QVVfeXFMTmtFQXMwMlY1el9WY0VRWEh5YkxXbHF0SjFLQVByNk1xS3hpdnBuUDVxOGZCQXl1QVFXaUVpbk5lUGgwRVVVT25tZlVUVWZqQzc4cm5MSVlfYmVlclFTOUFmTHF4eTlfemhTa2JKeG14bmNabENkSmZaeHB4WnZ5dw?hl=en-US&gl=US&ceid=US%3Aen"
]

for url in source_urls:
try:
decoded_url = new_decoderv1(url, interval=interval_time)
decoded_url = gnewsdecoder(url, interval=interval_time)
if decoded_url.get("status"):
print("Decoded URL:", decoded_url["decoded_url"])
else:
print("Error:", decoded_url["message"])
except Exception as e:
print(f"Error occurred: {e}")

# Output: decoded_url - {'status': True, 'decoded_url': 'https://healthdatamanagement.com/articles/empowering-the-quintuple-aim-embracing-an-essential-architecture/'}


if __name__ == "__main__":
main()
```



### Using a proxy to deal with rate limiting
### Using a for loop to decode multiple URLs with Proxy

```python
from googlenewsdecoder import new_decoderv1
from googlenewsdecoder import gnewsdecoder

def main():
interval_time = 1 # interval is optional, default is None
proxy = "http://user:pass@localhost:8080" # proxy is optional, default is None

interval_time = 5 # default interval is 1 sec, if not specified

source_url = "https://news.google.com/read/CBMi2AFBVV95cUxPd1ZCc1loODVVNHpnbFFTVHFkTG94eWh1NWhTeE9yT1RyNTRXMVV2S1VIUFM3ZlVkVjl6UHh3RkJ0bXdaTVRlcHBjMWFWTkhvZWVuM3pBMEtEdlllRDBveGdIUm9GUnJ4ajd1YWR5cWs3VFA5V2dsZnY1RDZhVDdORHRSSE9EalF2TndWdlh4bkJOWU5UMTdIV2RCc285Q2p3MFA4WnpodUNqN1RNREMwa3d5T2ZHS0JlX0MySGZLc01kWDNtUEkzemtkbWhTZXdQTmdfU1JJaXY?hl=en-US&gl=US&ceid=US%3Aen"

try:
decoded_url = new_decoderv1(source_url, proxy="http://user:pass@localhost:8080")
if decoded_url.get("status"):
print("Decoded URL:", decoded_url["decoded_url"])
else:
print("Error:", decoded_url["message"])
except Exception as e:
print(f"Error occurred: {e}")
source_urls = [
"https://news.google.com/read/CBMilgFBVV95cUxOM0JJaFRwV2dqRDk5dEFpWmF1cC1IVml5WmVtbHZBRXBjZHBfaUsyalRpa1I3a2lKM1ZnZUI4MHhPU2sydi1nX3JrYU0xWjhLaHNfU0N6cEhOYVE2TEptRnRoZGVTU3kzZGJNQzc2aDZqYjJOR0xleTdsemdRVnJGLTVYTEhzWGw4Z19lR3AwR0F1bXlyZ0HSAYwBQVVfeXFMTXlLRDRJUFN5WHg3ZTI0X1F4SjN6bmFIck1IaGxFVVZyOFQxdk1JT3JUbl91SEhsU0NpQzkzRFdHSEtjVGhJNzY4ZTl6eXhESUQ3XzdWVTBGOGgwSmlXaVRmU3BsQlhPVjV4VWxET3FQVzJNbm5CUDlUOHJUTExaME5YbjZCX1NqOU9Ta3U?hl=en-US&gl=US&ceid=US%3Aen",
"https://news.google.com/read/CBMiiAFBVV95cUxQOXZLdC1hSzFqQVVLWGJVZzlPaDYyNjdWTURScV9BbVp0SWhFNzZpSWZxSzdhc0tKbVlHMU13NmZVOFdidFFkajZPTm9SRnlZMWFRZ01CVHh0dXU0TjNVMUxZNk9Ibk5DV3hrYlRiZ20zYkIzSFhMQVVpcTFPc00xQjhhcGV1aXM00gF_QVVfeXFMTmtFQXMwMlY1el9WY0VRWEh5YkxXbHF0SjFLQVByNk1xS3hpdnBuUDVxOGZCQXl1QVFXaUVpbk5lUGgwRVVVT25tZlVUVWZqQzc4cm5MSVlfYmVlclFTOUFmTHF4eTlfemhTa2JKeG14bmNabENkSmZaeHB4WnZ5dw?hl=en-US&gl=US&ceid=US%3Aen"
]

# Output: decoded_urls - [{'status': True, 'decoded_url': 'https://healthdatamanagement.com/articles/empowering-the-quintuple-aim-embracing-an-essential-architecture/'}]
for url in source_urls:
try:
decoded_url = gnewsdecoder(url, interval=interval_time, proxy=proxy)
if decoded_url.get("status"):
print("Decoded URL:", decoded_url["decoded_url"])
else:
print("Error:", decoded_url["message"])
except Exception as e:
print(f"Error occurred: {e}")

if __name__ == "__main__":
main()
Expand Down
33 changes: 32 additions & 1 deletion googlenewsdecoder/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,36 @@
from .new_decoderv1 import decode_google_news_url as new_decoderv1
from .decoderv1 import decode_google_news_url as decoderv1
from .decoderv2 import decode_google_news_url as decoderv2
from .decoderv3 import decode_google_news_url as decoderv3
from .decoderv4 import decode_google_news_url as decoderv4
from .new_decoderv1 import decode_google_news_url as new_decoderv1
from .new_decoderv2 import GoogleDecoder
from .__version__ import __version__


def gnewsdecoder(source_url, interval=None, proxy=None):
"""
Decodes a Google News article URL into its original source URL.
This is a convenience function that uses the GoogleDecoder class internally.
Parameters:
source_url (str): The Google News article URL.
interval (int, optional): Delay time in seconds before decoding to avoid rate limits.
proxy (str, optional): Proxy to be used for all requests.
Returns:
dict: A dictionary containing 'status' and 'decoded_url' if successful,
otherwise 'status' and 'message'.
"""
decoder = GoogleDecoder(proxy=proxy)
return decoder.decode_google_news_url(source_url, interval=interval)


__all__ = [
"decoderv1",
"decoderv2",
"decoderv3",
"decoderv4",
"new_decoderv1",
"GoogleDecoder",
"gnewsdecoder",
]
1 change: 1 addition & 0 deletions googlenewsdecoder/__version__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "0.1.7"
Loading

0 comments on commit d7bc07d

Please sign in to comment.