Skip to content

2. Sources

Brian Dashore edited this page Apr 26, 2023 · 2 revisions

2a. Creating a source object

Sources are used to provide information from external websites, RSS feeds, or APIs on Ferrite. It's preferred to prefer APIs and RSS over scraping as those methods do not overload a website.

Example

- name: My cool source
  version: 1
  minVersion: '0.7'
  about: >-
    # Folded about description
  website: https://mycoolwebsite.com
  tags:
    # YAML sequence of tag blocks
    - name: A cool tag
      color: # A hex value
  trackers:
    # YAML sequence of URLs
    - udp://tracker1.com/announce
    - http://tracker2.net/announce
  api:
    # Add API info here
  jsonParser:
    # Add JSON parser here
  rssParser:
    # Add RSS parser here
  htmlParser:
    # Add HTML parser here## Fields

name

Required String: The source's name.

version

Required Integer: This is the version number of the source. Each update to the source increments the version by 1. Only increment the version when you are sure that the source is ready to be published.

minVersion

Optional String: The minimum app version this source can run on. YAML plugins only run on v0.7 or above by default.

about

Optional String: A short description of the source. Will be shown in source-specific settings. Recommended to use line folding as shown in the template.

website

Optional String: The base URL of the website. For example, https://google.com is the base URL of Google. DO NOT include the trailing slash otherwise the source will break. Required if dynamicWebsite is not used.

dynamicWebsite

Optional Boolean: Marks if the website can be filled in the source settings by the user. Used for locally hosted sources.

tags

Optional Array<Tag>: Please see Tag documentation.

trackers

Optional Array<String>: If only a magnet hash field is provided, this field is used to construct a magnet link. Trackers are provided as an array of strings and the best way to get them is to decode a magnet link from the website itself. Magnet links usually contain trackers after an &tr part of the URL.

jsonParser

Optional JsonParser: Allows for parsing of API payloads. Always prefer this over HTML parsers!

rssParser

Optional RssParser: Allows for parsing of RSS feeds. Always prefer this over HTML parsers!

htmlParser

Optional HtmlParser: The web scraping module for a source. Use this if a source does not have an API and allows scraping!

2b. The API info block

API information is required if you want to query from a website that contains API routes. These are also used for aggregate sources that run a local server.

Example

api:
  apiUrl: # base URL for API routes
  clientId: # client ID or username credential
  clientSecret: # client secret or password credential

Fields

apiUrl

Optional String: The base URL for api routes.

clientId

Optional ApiCredential: If an API wants a username or client ID, add a block here.

clientSecret

Optional ApiCredential: If an API wants a password or client secret, add a block here.

2b. The JSON parser

JSON parsers are used for parsing JSON apis from certain websites. Making this module requires some knowledge on understanding JSON fields. This is paired with the API module.

Here is a template for a JSON parser with all the fields filled out. If a field is optional, remove it.

Example

jsonParser:
  searchUrl: # Search path
  results: # Results located on first JSON layer
  subResults: # Results located on second JSON layer
  magnetHash:
    # Complex query
  magnetLink:
    # Complex query
  subName:
    # Complex query
  title:
    # Complex query
  size:
    # Complex query
  sl:
    seeders: # Seeder key name
    leechers: # Leecher key name

Fields

searchUrl

Optional String: The URL given when querying an API. For example, when given a URL such as https://www.google.com/search?q=hello, the search URL is whatever comes after the base URL (in this case /search?q=hello). It is important to include the slash at the beginning otherwise the source will break.

results

Optional String: JSON field for a results array. Most API results are compacted in an array of JSON objects.

subResults

Optional String: JSON field for results on the second layer of JSON for a result.

magnetHash

Optional ComplexQuery: JSON key for the location of a magnet hash

magnetLink

Optional ComplexQuery: JSON key for the location of a magnet link

subName

Optional ComplexQuery: Used for aggregate sources. Adds the original website name where the aggregate source fetched the item from.

title

Optional ComplexQuery: JSON key for the location of a result title

size

Optional ComplexQuery: JSON key for the size of a result

sl

Optional: Used to get seeder and leecher values for an item. All the below properties are optional

  • seeders String: The JSON result key for seeders

  • leechers String: The JSON result key for leechers

2c. The RSS parser

RSS parsers are used for parsing RSS feeds from websites. Making this module doesn't require any prior concepts and shouldn't be difficult to pick up.

Here is a template for an RSS parser with all the fields filled out. If a field is optional, remove it.

Example

rssParser:
  searchUrl: # Search path
  items: # Items selector
  magnetHash:
    # Complex query
  magnetLink:
    # Complex query
  subName:
    # Complex query
  title:
    # Complex query
  size:
    # Complex query
  sl:
    seeders: # Seeder tag name/value (if discriminator present)
    leechers: # Leecher tag name/value (if discriminator present)
    discriminator: # Complex query discriminator
    attribute: # Seeder and leecher value tag name

Fields

rssUrl

Optional String: Base URL of the RSS feed. Some websites use domains such as feed.website.com for RSS feeds as opposed to website.com.

searchUrl

Optional String: The URL given when searching content on a feed. For example, when given a URL such as https://www.google.com/search?q=hello, the search URL is whatever comes after the base URL (in this case /search?q=hello). It is important to include the slash at the beginning otherwise the source will break.

Parameters:

  • {query} Var: Replaced with the user's URLencoded search query

items

Required String: The tag name for items in an RSS xml document. This is usually called item.

magnetHash

Optional ComplexQuery: The tag name for a magnet hash (if present).

magnetLink

Optional ComplexQuery: The tag name for a magnet link (if present). If a magnetLink field isn't provided, a magnetHash field must be provided with trackers.

subName

Optional ComplexQuery: Used for aggregate sources. Adds the original website name where the aggregate source fetched the item from.

title

Optional ComplexQuery: The tag name for the title of an item

size

Optional ComplexQuery: The tag name for the size of an item. It doesn't matter if the size is in an integer format, Ferrite will convert it into the proper format (GB, MB, etc).

sl

Optional: Used to get seeder and leecher values for an item. All the below properties are optional

Arguments:

  • seeders String: The tag name for seeders

  • leechers String: The tag name for leechers

  • discriminator String: A replication of discriminator from complex queries

  • attribute String: Tag name used for both seeders and leechers

2d. The HTML parser

HTML parsers are used for web scraping. Making this module requires understanding web scraping and basic DOM methods such as querySelector and querySelectorAll for testing.

Here is a template for an HTML parser with all the fields filled out. If a field is optional, remove it.

Example

htmlParser:
  searchUrl: # Search URL path
  rows: # Row selector
  magnet:
    # Complex query
    externalLinkQuery: # If a magnet is on a different webpage
  subName:
    # Complex query
  title:
    # Complex query
  size:
    # Complex query
  sl:
    seeders: # Seeder selector
    leechers: # Leecher selector
    combined: # Combined seeder and leecher string selector
    attribute: # Complex query attribute
    seederRegex: # Regex for seeder string
    leecherRegex: # Regex for leecher string

Fields

searchUrl

Required String: The URL given when searching content on a website. For example, when given a URL such as https://www.google.com/search?q=hello, the search URL is whatever comes after the base URL (in this case /search?q=hello). It is important to include the slash at the beginning otherwise the source will break.

Parameters:

  • {query} Var: Replaces with the user's URL encoded search query

rows

Required String: The CSS selector for selecting a table row. Most of these sites use HTML tables. Please consult this while web scraping.

magnet

Required ComplexQuery: The HTML parser only looks for magnet links.

Extra arguments:

  • externalLinkQuery String: If a magnet link is located on a different page, this fetches the URL required to navigate to that page and fetch the magnet link. This will make the source slow as the search results scale, please add the Slow tag if using this argument.

subName

Optional ComplexQuery: Used for aggregate sources. Adds the original website name where the aggregate source fetched the item from.

title

Optional ComplexQuery: Follows complex query spec. No unique parameters.

size

Optional ComplexQuery: Follows complex query spec. No unique parameters.

sl (seeders and leechers)

Optional: Used to get seeder and leecher values on a website. All the below properties are optional.

Arguments:

  • seeders String: The seeder CSS selector
  • leechers String: The leecher CSS selector
  • combined String: A CSS selector used when seeders and leechers are in one string (ex. Seeders: 100 / Leechers: 200)
  • attribute String: Tag name used for both seeders and leechers
  • seederRegex String: Regex used to strip the seeder value from a string (follows the same rules as complex query regexes)
  • leecherRegex String: Regex used to strip the leecher value from a string (follows the same rules as complex query regexes)

2e. The complex query

These are generic queries used by Ferrite for keys that require a little more information when parsing the contents.

Any key that has ComplexQuery as a tag will always use these parameters.

Here is a template for a complex query with all the parameters filled in. If an parameter is optional, remove it.

Example

query: # CSS selector for the scraper
discriminator: # For RSS and JSON - tag discrimination
attribute: # Name or value of tag depending on discriminator presence
regex: # Regex string

Fields

query

Required String: The CSS selector for selecting the element in question.

attribute

Required String: The attribute to look for after selecting the query (ex. href, title, span). The default value is text for getting a tag's textContent.

discriminator

Optional String: Used with RSS and JSON parsers.

If an RSS tag is formatted something such as with <attr name="magnet" value="magnetLinkHere"> and you want the value, the discriminator would be name and the attribute will be value.

If some JSON parameters return similar values, such as a title, a discriminator can be used to separate entries that have the same title but other differing parameters.

regex

Optional String: Runs regex on the query result before presentation to the user.

  • Do not include the beginning and end slashes in this string (ex. /regex/)

  • When using a \ character, escape it using \\

  • If a regex does not have a capturing group, it would be assumed to check for a match

  • This regex must have only one capturing group if you want to return data. Don't know what a capture group is?

2f. The API credential

These are generic queries used for grabbing and storing API credentials. Most of these fields are not needed, but some APIs require way too much authentication for simple queries.

Example

url: # URL to grab the credential
dynamic: # Should the credential be dynamically set by the user
expiryLength: # How long the credential will last
responseType: # Response type from the URL
query: # Where in the response the credential will be located

Fields

url

Optional String: If there is a separate URL to grab the credential from, enter it here.

dynamic

Optional Boolean: Indicates if the credential should be entered by the user in the source's settings.

expiryLength

Optional Integer: How long the credential lasts for.

responseType

Optional String: What the response is from the credential URL (Ex. json).

query

Optional String: Field where the credential is located from a URL response.