Skip to content

Commit

Permalink
Merge pull request backstage#390 from awanlin/topic/add-linguit-proce…
Browse files Browse the repository at this point in the history
…ssor-module

Linguist - Added new dedicated module for the Linguist Tags Processor
  • Loading branch information
awanlin authored Jul 5, 2024
2 parents 0eeb04c + 3c0afa4 commit 91eae6b
Show file tree
Hide file tree
Showing 18 changed files with 1,892 additions and 128 deletions.
6 changes: 6 additions & 0 deletions workspaces/linguist/.changeset/polite-pots-smile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
'@backstage-community/plugin-linguist-backend': patch
'@backstage-community/plugin-catalog-backend-module-linguist-tags-processor': patch
---

Added new dedicated module for the Linguist Tags Processor and deprecated the version in the Linguist Backend
1 change: 1 addition & 0 deletions workspaces/linguist/packages/backend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
"build-image": "docker build ../.. -f Dockerfile --tag backstage"
},
"dependencies": {
"@backstage-community/plugin-catalog-backend-module-linguist-tags-processor": "workspace:^",
"@backstage-community/plugin-linguist-backend": "workspace:^",
"@backstage/backend-common": "^0.23.2",
"@backstage/backend-defaults": "^0.3.3",
Expand Down
5 changes: 5 additions & 0 deletions workspaces/linguist/packages/backend/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,11 @@ backend.add(import('@backstage/plugin-catalog-backend/alpha'));
backend.add(
import('@backstage/plugin-catalog-backend-module-scaffolder-entity-model'),
);
backend.add(
import(
'@backstage-community/plugin-catalog-backend-module-linguist-tags-processor'
),
);

// permission plugin
backend.add(import('@backstage/plugin-permission-backend/alpha'));
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
module.exports = require('@backstage/cli/config/eslint-factory')(__dirname);
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Linguist Tags Processor backend module for the Catalog plugin

## Overview

The Linguist Tags Processor can be added into your catalog as a way to incorporate the language breakdown from Linguist as `metadata.tags` on your entities. Doing so enables the ability to easily filter for entities in your catalog index based on the language of the source repository.

## Setup

To setup the Linguist Tags Processor you'll need to first run this command to add the package:

```sh
# From your Backstage root directory
yarn --cwd packages/backend add @backstage-community/plugin-catalog-backend-module-linguist-tags-processor
```

Then in your `/packages/backend/src/index.ts` file you simply add the following line:

```diff
import { createBackend } from '@backstage/backend-defaults';

const backend = createBackend();

// ... other feature additions

+ backend.add(import('@backstage-community/plugin-catalog-backend-module-linguist-tags-processor'));

backend.start();
```

### Processor Options

The processor can be configured in `app-config.yaml`, here is an example Linguist Tag Processor configuration:

```yaml
linguist:
tagsProcessor:
bytesThreshold: 1000
languageTypes: ['programming', 'markup']
languageMap:
Dockerfile: ''
TSX: 'react'
tagPrefix: 'lang:'
cacheTTL:
hours: 24
```
#### `languageMap`

The `languageMap` option allows you to build a custom map of linguist languages to how you want them to show up as tags. The keys should be exact matches to languages in the [linguist dataset](https://github.com/github-linguist/linguist/blob/master/lib/linguist/languages.yml) and the values should be how they render as backstage tags. These values will be used "as is" and will not be further transformed.

Keep in mind that backstage has [character requirements for tags](https://backstage.io/docs/features/software-catalog/descriptor-format#tags-optional). If your map emits an invalid tag, it will cause an error during processing and your entity will not be processed.

If you map a key to `''`, it will not be emitted as a tag. This can be useful if you want to ignore some of the linguist languages.

```yaml
linguist:
tagsProcessor:
languageMap:
# You don't want dockerfile to show up as a tag
Dockerfile: ''
# Be more specific about what the file is
HCL: terraform
# A more casual tag for a formal name
Protocol Buffer: protobuf
```

#### `tagPrefix`

The `tagPrefix` option allows you to provide a prefix to all tags created by linguist. Keep in mind that backstage has [character requirements for tags](https://backstage.io/docs/features/software-catalog/descriptor-format#tags-optional). If your prefix emits an invalid tag, it will cause an error during processing and your entity will not be processed.

As an example, use the following config to get tags like `lang:java` instead of just `java`.

```yaml
linguist:
tagsProcessor:
tagPrefix: 'lang:'
```

#### `cacheTTL`

The `cacheTTL` option allows you to determine for how long this processor will cache languages for an `entityRef` before refreshing from the linguist backend. As this processor will run continuously, this cache is supplied to limit the load done on the linguist DB and API.

By default, this processor will cache languages for 30 minutes before refreshing from the linguist database.

You can optionally disable the cache entirely by passing in a `cacheTTL` duration of 0 minutes.

```yaml
linguist:
tagsProcessor:
cacheTTL: { minutes: 0 }
```

#### `bytesThreshold`

The `bytesThreshold` option allows you to control a number of bytes threshold which must be surpassed before a language tag will be emitted by this processor. As an example, some repositories may have short build scripts written in Bash, but you may only want the main language of the project emitted (an alternate way to control this is to use the `languageMap` to map `Shell` languages to `undefined`).

```yaml
linguist:
tagsProcessor:
# Ignore languages with less than 5000 bytes in a repo.
bytesThreshold: 5000
```

#### `languageTypes`

The `languageTypes` option allows you to control what categories of linguist languages are automatically added as tags. By default, this will only include language tags of type `programming`, but you can pass in a custom array here to allow adding other language types.

You can see the full breakdown of linguist supported languages [in their repo](https://github.com/github-linguist/linguist/blob/master/lib/linguist/languages.yml).

For example, you may want to also include languages of type `data`

```yaml
linguist:
tagsProcessor:
languageTypes:
- programming
- data
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
## API Report File for "@backstage-community/plugin-catalog-backend-module-linguist-tags-processor"

> Do not edit this file. It is a report generated by [API Extractor](https://api-extractor.com/).
```ts
import { AuthService } from '@backstage/backend-plugin-api';
import { BackendFeatureCompat } from '@backstage/backend-plugin-api';
import { CatalogProcessor } from '@backstage/plugin-catalog-node';
import { CatalogProcessorCache } from '@backstage/plugin-catalog-node';
import { Config } from '@backstage/config';
import { DiscoveryService } from '@backstage/backend-plugin-api';
import { Entity } from '@backstage/catalog-model';
import { HumanDuration } from '@backstage/types';
import { LanguageType } from '@backstage-community/plugin-linguist-common';
import { LoggerService } from '@backstage/backend-plugin-api';

// @public (undocumented)
const catalogModuleLinguistTagsProcessor: BackendFeatureCompat;
export default catalogModuleLinguistTagsProcessor;

// @public
export class LinguistTagsProcessor implements CatalogProcessor {
constructor(options: LinguistTagsProcessorOptions);
// (undocumented)
static fromConfig(
config: Config,
options: LinguistTagsProcessorOptions,
): LinguistTagsProcessor;
// (undocumented)
getProcessorName(): string;
preProcessEntity(
entity: Entity,
_: any,
__: any,
___: any,
cache: CatalogProcessorCache,
): Promise<Entity>;
}

// @public
export interface LinguistTagsProcessorOptions {
// (undocumented)
auth: AuthService;
bytesThreshold?: number;
cacheTTL?: HumanDuration;
// (undocumented)
discovery: DiscoveryService;
languageMap?: Record<string, string | undefined>;
languageTypes?: LanguageType[];
// (undocumented)
logger: LoggerService;
tagPrefix?: string;
}

// @public
export type ShouldProcessEntity = (entity: Entity) => boolean;
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
* Copyright 2023 The Backstage Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import { HumanDuration } from '@backstage/types';

export interface Config {
/** Configuration options for the linguist plugin */
linguist?: {
/** Options for the tags processor */
tagsProcessor?: {
/**
* Determines how many bytes of a language should be in a repo
* for it to be added as an entity tag. Defaults to 0.
*/
bytesThreshold?: number;
/**
* The types of linguist languages that should be processed. Can be
* any of "programming", "data", "markup", "prose". Defaults to ["programming"].
*/
languageTypes?: string[];
/**
* A custom mapping of linguist languages to how they should be rendered as entity tags.
* If a language is mapped to '' it will not be included as a tag.
*/
languageMap?: {
[language: string]: string | undefined;
};
/**
* How long to cache entity languages for in memory. Used to avoid constant db hits during
* processing. Defaults to 30 minutes.
*/
cacheTTL?: HumanDuration;
/**
* An optional prefix to apply to all created tags from linguist
*/
tagPrefix?: string;
};
};
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"name": "@backstage-community/plugin-catalog-backend-module-linguist-tags-processor",
"description": "The linguist-tags-processor backend module for the catalog plugin.",
"version": "0.1.0",
"main": "src/index.ts",
"types": "src/index.ts",
"license": "Apache-2.0",
"publishConfig": {
"access": "public",
"main": "dist/index.cjs.js",
"types": "dist/index.d.ts"
},
"repository": {
"type": "git",
"url": "https://github.com/backstage/community-plugins",
"directory": "workspaces/linguist/plugins/catalog-backend-module-linguist-tags-processor"
},
"backstage": {
"role": "backend-plugin-module"
},
"scripts": {
"start": "backstage-cli package start",
"build": "backstage-cli package build",
"lint": "backstage-cli package lint",
"test": "backstage-cli package test",
"clean": "backstage-cli package clean",
"prepack": "backstage-cli package prepack",
"postpack": "backstage-cli package postpack"
},
"dependencies": {
"@backstage-community/plugin-linguist-common": "workspace:^",
"@backstage/backend-common": "^0.21.7",
"@backstage/backend-plugin-api": "^0.6.17",
"@backstage/catalog-model": "^1.4.5",
"@backstage/config": "^1.2.0",
"@backstage/plugin-catalog-node": "^1.11.1",
"@backstage/types": "^1.1.1",
"node-fetch": "^2.6.7"
},
"devDependencies": {
"@backstage/backend-tasks": "^0.5.22",
"@backstage/backend-test-utils": "^0.3.7",
"@backstage/cli": "^0.26.3",
"js-yaml": "^4.1.0",
"linguist-js": "^2.5.3"
},
"files": [
"dist"
],
"configSchema": "config.d.ts"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/**
* The Linguist Tags Processor backend module for the Catalog plugin.
*
* @packageDocumentation
*/

export { catalogModuleLinguistTagsProcessor as default } from './module';
export * from './processor';
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import {
coreServices,
createBackendModule,
} from '@backstage/backend-plugin-api';
import { catalogProcessingExtensionPoint } from '@backstage/plugin-catalog-node/alpha';
import { LinguistTagsProcessor } from './processor';

/** @public */
export const catalogModuleLinguistTagsProcessor = createBackendModule({
pluginId: 'catalog',
moduleId: 'linguist-tags-processor',
register(reg) {
reg.registerInit({
deps: {
catalog: catalogProcessingExtensionPoint,
config: coreServices.rootConfig,
logger: coreServices.logger,
discovery: coreServices.discovery,
auth: coreServices.auth,
},
async init({ catalog, config, logger, discovery, auth }) {
catalog.addProcessor(
LinguistTagsProcessor.fromConfig(config, { logger, discovery, auth }),
);
},
});
},
});
Loading

0 comments on commit 91eae6b

Please sign in to comment.