Skip to content

Commit

Permalink
Merge pull request #630 from podverse/develop
Browse files Browse the repository at this point in the history
Release v4.13.3
  • Loading branch information
mitchdowney authored Jul 7, 2023
2 parents 0deb1a6 + 24fbeef commit d24b4b6
Show file tree
Hide file tree
Showing 6 changed files with 268 additions and 119 deletions.
140 changes: 30 additions & 110 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,30 @@

Data API, database migration scripts, and backend services for the Podverse ecosystem

## Getting started

### Local Development and Deployment

This repo contains steps for running podverse-api locally for development.

For stage/prod deployment instructions, please refer to the
[podverse-ops docs](https://github.com/podverse/podverse-ops).
- [Getting started](#getting-started)
* [NPM or Yarn](#npm-or-yarn)
* [Setup environment variables](#setup-environment-variables)
* [Install node_modules](#install-node-modules)
* [Start dev server](#start-dev-server)
* [Populate database](#populate-database)
* [Add podcast categories to the database](#add-podcast-categories-to-the-database)
* [Sync podcast data with Podcast Index API](#sync-podcast-data-with-podcast-index-api)
* [Matomo page tracking and analytics](#matomo-page-tracking-and-analytics)
* [More info](#more-info)

<small><i><a href='http://ecotrust-canada.github.io/markdown-toc/'>generated with markdown-toc</a></i></small>

### Prereqs
## Getting started

Before you can run podverse-api you will need a local Postgres version 11.5 database running.
If you are looking to run this app or contribute to Podverse for the first time, please read the sections that are relevant for you in our [CONTRIBUTE.md](https://github.com/podverse/podverse-ops/blob/master/CONTRIBUTING.md) file in the podverse-ops repo. Among other things, that file contains instructions for running a local instance of the Podverse database.

You can setup your own database, or go to the
[podverse-ops repo](https://github.com/podverse/podverse-ops), add the podverse-db-local.env file as explained in the docs, then run this command:
### NPM or Yarn

```bash
docker-compose -f docker-compose.local.yml up -d podverse_db
```
We use yarn and maintain a `yarn.lock` file, but it's not really a requirement for you to use yarn. This documentation uses npm in examples, but we generally use the yarn equivalent commands.

### Setup environment variables

For local development, environment variables are provided by a local .env file. Duplicate the .env.example file, rename it to .env, and update all of the environment variables to match what is needed for your environment.
For local development, environment variables are provided by a local `.env` file. You can find a link to example `.env` files in the [CONTRIBUTING.md](https://github.com/podverse/podverse-ops/blob/master/CONTRIBUTING.md) file.

### Install node_modules

Expand All @@ -38,113 +39,32 @@ npm install
npm run dev
```

### Sample database data
### Populate database

**TODO: Sample db instructions are out of date**
The [podverse-ops repo](https://github.com/podverse/podverse-ops) contains the qa-database.sql file to help you get started quickly with a development database. You can clone the podverse-ops repo, then run the following command after the Postgres database is running:

```bash
psql -h 0.0.0.0 -p 5432 -U postgres -W -f ./sample-database/qa-database.sql
```

The password for the .sql file is: mysecretpw
Instructions for this can be found in the [podverse-ops CONTRIBUTING.md file](https://github.com/podverse/podverse-ops/blob/master/CONTRIBUTING.md).

### Add podcast categories to the database

```bash
npm run dev:seeds:categories
```

### Add feed urls to the database

To add podcasts to the database, you first need to add feed urls to the
database, and then run the podcast parser with those feed urls.

You can pass multiple feed urls as a comma-delimited string parameter to the
`npm run dev:scripts:addFeedUrls` command.

A list of sample podcast feed urls can be found in
[podverse-api/docs/sampleFeedUrls.txt](https://github.com/podverse/podverse-api/tree/deploy/docs/sampleFeedUrls.txt).

```bash
npm run dev:scripts:addFeedUrls <feed urls>
```

### Parse feed urls to add podcasts and episodes to the database

Orphan feed urls do not have a podcast associated with them.

```bash
npm run dev:scripts:parseOrphanFeedUrls
```

To parse all non-orphan and public feed urls, you can run:

```bash
npm run dev:scripts:parsePublicFeedUrls
```

### Use SQS to add feed urls to a queue, then parse them

This project uses AWS SQS for its remote queue.

```bash
npm run dev:scripts:addAllOrphanFeedUrlsToPriorityQueue
```

or:

```bash
npm run dev:scripts:addAllPublicFeedUrlsToQueue
```

or:

```bash
npm run dev:scripts:addNonPodcastIndexFeedUrlsToPriorityQueue
```

or to add all recently updated (according to Podcast Index), public feeds to the priority queue:
If you are creating a database from scratch, and not using the `populateDatabase` command explained in the CONTRIBUTE.md file, then you will need to populate the database with categories.

```bash
yarn dev:scripts:addRecentlyUpdatedFeedUrlsToPriorityQueue
```

After you have added feed urls to a queue, you can retrieve and then parse
the feed urls by running:

```bash
npm run dev:scripts:parseFeedUrlsFromQueue <restartTimeOut> <queueType>
# restartTimeOut in milliseconds; queueType is optional and only acceptable value is "priority"
npm run dev:seeds:categories
```

We also have a self-managed parsing queue, where we manually mark podcasts to be added to a separate queue for parsing at a regular cadence. The property is `Podcast.parsingPriority` and the `parsingPriority` is a value between 0-5. 0 is the default, and means the podcast should not be added to the self-managed queue. 1 is the most frequent, and 5 is the least frequent parsing.
### Sync podcast data with Podcast Index API

At the time of writing this, 3 is the value we are using the most, which adds the feeds to the queue every 30 minutes.
Podverse maintains its own podcast directory, and parses RSS feeds to populate it with data.

The `offset` value is optional, and probably not needed.
However, in prod Podverse syncs its database with the [Podcast Index API](https://podcastindex.org/), the world's largest open podcast directory, and the maintainers of the "Podcasting 2.0" RSS spec.

```bash
npm run dev:scripts:addFeedsToQueueByPriority <parsingPriority> <offset>
```
We run scripts in a cron interval that request from PI API a list of all the podcasts that it has detected updates in over the past X minutes, and then add those podcast IDs to an Amazon SQS queue for parsing, and then our parser containers, which run continuously, pull items from the queue, run our parser logic over it, then save the parsed data to our database.

Then to parse from the self-managed queue call:
If you'd like to run your own full instance of Podverse and would like a thorough explanation of the processes involved, please contact us and we can document it.

```bash
npm run dev:scripts:parseFeedUrlsFromQueue
```
### Matomo page tracking and analytics

### Request Google Analytics pageview data and save to database
TODO: explain the Matomo setup

Below are sample commands for requesting unique pageview data from Google
Analytics, which is used throughout the site for sorting by popularity (not a
great/accurate system for popularity sorting...).

```bash
npm run dev:scripts:queryUniquePageviews -- clips month
npm run dev:scripts:queryUniquePageviews -- episodes week
npm run dev:scripts:queryUniquePageviews -- podcasts allTime
```
### More info

See the [podverse-ops repo](https://github.com/podverse/podverse-ops) for a sample
cron configuration for querying the Google API on a timer.
We used to have a more detailed README file, but I removed most of the content, since it is unnecessary for most local development workflows, and the information it was getting out-of-date. If you're looking for more info though, you can try digging through our [old README file here](https://github.com/podverse/podverse-api/blob/develop/docs/old/old-readme.md).
150 changes: 150 additions & 0 deletions docs/old/old-readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# podverse-api

Data API, database migration scripts, and backend services for the Podverse ecosystem

## Getting started

### Local Development and Deployment

This repo contains steps for running podverse-api locally for development.

For stage/prod deployment instructions, please refer to the
[podverse-ops docs](https://github.com/podverse/podverse-ops).

### Prereqs

Before you can run podverse-api you will need a local Postgres version 11.5 database running.

You can setup your own database, or go to the
[podverse-ops repo](https://github.com/podverse/podverse-ops), add the podverse-db-local.env file as explained in the docs, then run this command:

```bash
docker-compose -f docker-compose.local.yml up -d podverse_db
```

### Setup environment variables

For local development, environment variables are provided by a local .env file. Duplicate the .env.example file, rename it to .env, and update all of the environment variables to match what is needed for your environment.

### Install node_modules

```bash
npm install
```

### Start dev server

```bash
npm run dev
```

### Sample database data

**TODO: Sample db instructions are out of date**
The [podverse-ops repo](https://github.com/podverse/podverse-ops) contains the qa-database.sql file to help you get started quickly with a development database. You can clone the podverse-ops repo, then run the following command after the Postgres database is running:

```bash
psql -h 0.0.0.0 -p 5432 -U postgres -W -f ./sample-database/qa-database.sql
```

The password for the .sql file is: mysecretpw

### Add podcast categories to the database

```bash
npm run dev:seeds:categories
```

### Add feed urls to the database

To add podcasts to the database, you first need to add feed urls to the
database, and then run the podcast parser with those feed urls.

You can pass multiple feed urls as a comma-delimited string parameter to the
`npm run dev:scripts:addFeedUrls` command.

A list of sample podcast feed urls can be found in
[podverse-api/docs/sampleFeedUrls.txt](https://github.com/podverse/podverse-api/tree/deploy/docs/sampleFeedUrls.txt).

```bash
npm run dev:scripts:addFeedUrls <feed urls>
```

### Parse feed urls to add podcasts and episodes to the database

Orphan feed urls do not have a podcast associated with them.

```bash
npm run dev:scripts:parseOrphanFeedUrls
```

To parse all non-orphan and public feed urls, you can run:

```bash
npm run dev:scripts:parsePublicFeedUrls
```

### Use SQS to add feed urls to a queue, then parse them

This project uses AWS SQS for its remote queue.

```bash
npm run dev:scripts:addAllOrphanFeedUrlsToPriorityQueue
```

or:

```bash
npm run dev:scripts:addAllPublicFeedUrlsToQueue
```

or:

```bash
npm run dev:scripts:addNonPodcastIndexFeedUrlsToPriorityQueue
```

or to add all recently updated (according to Podcast Index), public feeds to the priority queue:

```bash
yarn dev:scripts:addRecentlyUpdatedFeedUrlsToPriorityQueue
```

After you have added feed urls to a queue, you can retrieve and then parse
the feed urls by running:

```bash
npm run dev:scripts:parseFeedUrlsFromQueue <restartTimeOut> <queueType>
# restartTimeOut in milliseconds; queueType is optional and only acceptable value is "priority"
```

We also have a self-managed parsing queue, where we manually mark podcasts to be added to a separate queue for parsing at a regular cadence. The property is `Podcast.parsingPriority` and the `parsingPriority` is a value between 0-5. 0 is the default, and means the podcast should not be added to the self-managed queue. 1 is the most frequent, and 5 is the least frequent parsing.

At the time of writing this, 3 is the value we are using the most, which adds the feeds to the queue every 30 minutes.

The `offset` value is optional, and probably not needed.

```bash
npm run dev:scripts:addFeedsToQueueByPriority <parsingPriority> <offset>
```

Then to parse from the self-managed queue call:

```bash
npm run dev:scripts:parseFeedUrlsFromQueue
```

### Request Google Analytics pageview data and save to database

Below are sample commands for requesting unique pageview data from Google
Analytics, which is used throughout the site for sorting by popularity (not a
great/accurate system for popularity sorting...).

```bash
npm run dev:scripts:queryUniquePageviews -- clips month
npm run dev:scripts:queryUniquePageviews -- episodes week
npm run dev:scripts:queryUniquePageviews -- podcasts allTime
```

See the [podverse-ops repo](https://github.com/podverse/podverse-ops) for a sample
cron configuration for querying the Google API on a timer.
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "podverse-api",
"version": "4.13.2",
"version": "4.13.3",
"description": "Data API, database migration scripts, and backend services for all Podverse models.",
"contributors": [
"Mitch Downey"
Expand Down Expand Up @@ -156,7 +156,7 @@
"class-validator": "0.14.0",
"clean-webpack-plugin": "3.0.0",
"cookie": "0.4.0",
"crypto-js": "~3.1.9-1",
"crypto-js": "~3.2.1",
"csvtojson": "^2.0.10",
"date-fns": "2.8.1",
"docker-cli-js": "2.9.0",
Expand Down
41 changes: 41 additions & 0 deletions src/controllers/podcast.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,45 @@ const getPodcastByPodcastIndexId = async (podcastIndexId, includeRelations = tru
return podcast
}

const getPodcastByPodcastGuid = async (podcastGuid: string, includeRelations?: boolean) => {
const repository = getRepository(Podcast)
const podcast = await repository.findOne(
{
podcastGuid,
isPublic: true
},
{
relations: includeRelations ? ['authors', 'categories', 'feedUrls'] : []
}
)

if (!podcast) {
throw new createError.NotFound('Podcast not found')
}

return podcast
}

const getPodcastByFeedUrl = async (feedUrl: string, includeRelations?: boolean) => {
const podcastId = await getPodcastIdByFeedUrl(feedUrl)
const repository = getRepository(Podcast)
const podcast = await repository.findOne(
{
id: podcastId,
isPublic: true
},
{
relations: includeRelations ? ['authors', 'categories', 'feedUrls'] : []
}
)

if (!podcast) {
throw new createError.NotFound('Podcast not found')
}

return podcast
}

const findPodcastsByFeedUrls = async (urls: string[]) => {
const foundPodcastIds = [] as any
const notFoundFeedUrls = [] as any
Expand Down Expand Up @@ -410,6 +449,8 @@ export {
findPodcastsByFeedUrls,
getPodcast,
getPodcasts,
getPodcastByFeedUrl,
getPodcastByPodcastGuid,
getPodcastByPodcastIndexId,
getPodcastsFromSearchEngine,
getMetadata,
Expand Down
Loading

0 comments on commit d24b4b6

Please sign in to comment.