Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Astro DB #304

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

feat: add Astro DB #304

wants to merge 9 commits into from

Conversation

ViorelMocanu
Copy link
Owner

@ViorelMocanu ViorelMocanu commented Aug 12, 2024

📚 Description

This PR adds Astro DB as a data source, both for local development (via CSV seeder files) and remote Studio connection.

This is still a draft until I manage to make the resource page functional, plus all the taxonomy pages. The rest will have to be optimized in further PRs.

Known issues include:

  • Astro Studio (remote) is populating with the DB structure but not with the contents of the CSV files in seed.ts - maybe I should use fetching techniques instead of the file system
  • GitHub Actions for DB are failing
  • several TSC warnings left to fix
  • etc

I should fix them before merging this.

🔗 Linked issue(s)

Fixes #12 (Figure out initial database solution).
Fixes #73 (Refactor getStaticPaths).
Fixes #76 (Add all document links).
Fixes #100 (YouTube Embeds).
Fixes #137 (Refactor ResourceList and ResourceTOC).
Fixes #159 (Add Supabase).
Fixes #160 (Test Supabase as initial storage solution).
Fixes #170 (Redirect table in Supabase).
Fixes #284 (Test AstroDB as initial storage solution).

❓ Type of change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • 📖 Content update (text or image-related updates)
  • 🖼️ Design update (UI design and static template related change)
  • 🆕 New page (addition or major edit for an existing page layout)
  • 📓 Documentation update (improvements or additions to documentation)
  • 👌 Enhancement (improving an existing functionality, like performance)
  • ✨ New feature (a non-breaking change that adds functionality)
  • 🛠️ Operational update (changes to configuration, CI/CD, procedures, etc.)
  • 🛡️ Security update (a fix or update required to alleviate a cyber security issue)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected)

📄 Changelog

  • cleanup and editing of operational files
  • update all packages and add PNPM action to ease updating via pnpm up
  • add some additional packages and activate experimental features like ViewTransition and Astro Server Islands as a POC
  • scrape document to get links, scrape all links to get meta titles and descriptions where possible, generate taxonomy and resource CSVs with ChatGPT, clean them up, upload to Google Sheets as visual database, populate with missing info and download data as final CSV for seeding into database
  • insert Astro DB into configurations and use it to store data
  • steal good code from the Supabase branch and factor it into the current one

✅ Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings or errors
  • I have manually tested the newly added functionality with no discernable errors
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@ViorelMocanu ViorelMocanu added enhancement New feature or request design A task related to UI design and static templating implementation. page An entire page layout to create, including static HTML, CSS and templating. ops Operational updates: configs, CI/CD, procedures, project content Text or image-related updates dependencies Pull requests that update a dependency file refactor Remake an existing functionality to KISS and DRY and make it better. javascript Pull requests that update Javascript code labels Aug 12, 2024
@ViorelMocanu ViorelMocanu added this to the v0.5 milestone Aug 12, 2024
@ViorelMocanu ViorelMocanu self-assigned this Aug 12, 2024
Copy link

vercel bot commented Aug 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
digital-resources ⬜️ Ignored (Inspect) Visit Preview Oct 7, 2024 10:48am

db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
db/seed.ts Fixed Show fixed Hide fixed
u.startsWith('https://vimeo.com/')
)
resourceType = 4;
else if (u.indexOf('udemy.com/') > -1 || u.indexOf('course') > -1) resourceType = 3;

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
udemy.com/
' can be anywhere in the URL, and arbitrary hosts may come before or after it.
)
resourceType = 4;
else if (u.indexOf('udemy.com/') > -1 || u.indexOf('course') > -1) resourceType = 3;
else if (u.indexOf('amazon.com/') > -1 || u.indexOf('pdf') > -1 || u.indexOf('book') > -1) resourceType = 2;

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
amazon.com/
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix AI 3 months ago

To fix the problem, we need to parse the URL and check its host against a whitelist of allowed hosts. This ensures that the host is exactly what we expect and not just a substring within a potentially malicious URL.

  • Use the url module to parse the URL and extract the host.
  • Define a whitelist of allowed hosts.
  • Check if the parsed host is in the whitelist before categorizing the resource.
Suggested changeset 1
db/seed.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/db/seed.ts b/db/seed.ts
--- a/db/seed.ts
+++ b/db/seed.ts
@@ -1,2 +1,3 @@
 import * as fs from 'fs';
+import { URL } from 'url';
 import { Author, Like, LikeTest, NOW, Rating, RelationResourceTag, Resource, ResourceType, Tag, TagType, Taxonomy, TaxonomyType, User, Visits, db } from 'astro:db';
@@ -305,15 +306,25 @@
 			const u = r.url.toLowerCase();
-			if (
-				u.startsWith('https://youtube.com/') ||
-				u.startsWith('http://youtube.com/') ||
-				u.startsWith('https://www.youtube.com/') ||
-				u.startsWith('http://www.youtube.com/') ||
-				u.startsWith('https://youtu.be/') ||
-				u.startsWith('https://vimeo.com/')
-			)
-				resourceType = 4;
-			else if (u.indexOf('udemy.com/') > -1 || u.indexOf('course') > -1) resourceType = 3;
-			else if (u.indexOf('amazon.com/') > -1 || u.indexOf('pdf') > -1 || u.indexOf('book') > -1) resourceType = 2;
-			else if (u.indexOf('github.com/') > -1 || u.indexOf('gitlab.com/') > -1) resourceType = 5;
-			else if (u.indexOf('medium.com/') > -1 || u.indexOf('dev.to/') > -1 || u.indexOf('blog') > -1 || t.indexOf('blog') > -1 || u.indexOf('.pdf') > -1) resourceType = 6;
+			const parsedUrl = new URL(u);
+			const host = parsedUrl.host;
+			const allowedHosts = {
+				'youtube.com': 4,
+				'www.youtube.com': 4,
+				'youtu.be': 4,
+				'vimeo.com': 4,
+				'udemy.com': 3,
+				'amazon.com': 2,
+				'github.com': 5,
+				'gitlab.com': 5,
+				'medium.com': 6,
+				'dev.to': 6
+			};
+			if (allowedHosts[host]) {
+				resourceType = allowedHosts[host];
+			} else if (u.indexOf('course') > -1) {
+				resourceType = 3;
+			} else if (u.indexOf('pdf') > -1 || u.indexOf('book') > -1) {
+				resourceType = 2;
+			} else if (u.indexOf('blog') > -1 || t.indexOf('blog') > -1 || u.indexOf('.pdf') > -1) {
+				resourceType = 6;
+			}
 			r.resource_type_id = !!row.resource_type_id ? row.resource_type_id : resourceType;
EOF
@@ -1,2 +1,3 @@
import * as fs from 'fs';
import { URL } from 'url';
import { Author, Like, LikeTest, NOW, Rating, RelationResourceTag, Resource, ResourceType, Tag, TagType, Taxonomy, TaxonomyType, User, Visits, db } from 'astro:db';
@@ -305,15 +306,25 @@
const u = r.url.toLowerCase();
if (
u.startsWith('https://youtube.com/') ||
u.startsWith('http://youtube.com/') ||
u.startsWith('https://www.youtube.com/') ||
u.startsWith('http://www.youtube.com/') ||
u.startsWith('https://youtu.be/') ||
u.startsWith('https://vimeo.com/')
)
resourceType = 4;
else if (u.indexOf('udemy.com/') > -1 || u.indexOf('course') > -1) resourceType = 3;
else if (u.indexOf('amazon.com/') > -1 || u.indexOf('pdf') > -1 || u.indexOf('book') > -1) resourceType = 2;
else if (u.indexOf('github.com/') > -1 || u.indexOf('gitlab.com/') > -1) resourceType = 5;
else if (u.indexOf('medium.com/') > -1 || u.indexOf('dev.to/') > -1 || u.indexOf('blog') > -1 || t.indexOf('blog') > -1 || u.indexOf('.pdf') > -1) resourceType = 6;
const parsedUrl = new URL(u);
const host = parsedUrl.host;
const allowedHosts = {
'youtube.com': 4,
'www.youtube.com': 4,
'youtu.be': 4,
'vimeo.com': 4,
'udemy.com': 3,
'amazon.com': 2,
'github.com': 5,
'gitlab.com': 5,
'medium.com': 6,
'dev.to': 6
};
if (allowedHosts[host]) {
resourceType = allowedHosts[host];
} else if (u.indexOf('course') > -1) {
resourceType = 3;
} else if (u.indexOf('pdf') > -1 || u.indexOf('book') > -1) {
resourceType = 2;
} else if (u.indexOf('blog') > -1 || t.indexOf('blog') > -1 || u.indexOf('.pdf') > -1) {
resourceType = 6;
}
r.resource_type_id = !!row.resource_type_id ? row.resource_type_id : resourceType;
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
resourceType = 4;
else if (u.indexOf('udemy.com/') > -1 || u.indexOf('course') > -1) resourceType = 3;
else if (u.indexOf('amazon.com/') > -1 || u.indexOf('pdf') > -1 || u.indexOf('book') > -1) resourceType = 2;
else if (u.indexOf('github.com/') > -1 || u.indexOf('gitlab.com/') > -1) resourceType = 5;

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
github.com/
' can be anywhere in the URL, and arbitrary hosts may come before or after it.
resourceType = 4;
else if (u.indexOf('udemy.com/') > -1 || u.indexOf('course') > -1) resourceType = 3;
else if (u.indexOf('amazon.com/') > -1 || u.indexOf('pdf') > -1 || u.indexOf('book') > -1) resourceType = 2;
else if (u.indexOf('github.com/') > -1 || u.indexOf('gitlab.com/') > -1) resourceType = 5;

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
gitlab.com/
' can be anywhere in the URL, and arbitrary hosts may come before or after it.
else if (u.indexOf('udemy.com/') > -1 || u.indexOf('course') > -1) resourceType = 3;
else if (u.indexOf('amazon.com/') > -1 || u.indexOf('pdf') > -1 || u.indexOf('book') > -1) resourceType = 2;
else if (u.indexOf('github.com/') > -1 || u.indexOf('gitlab.com/') > -1) resourceType = 5;
else if (u.indexOf('medium.com/') > -1 || u.indexOf('dev.to/') > -1 || u.indexOf('blog') > -1 || t.indexOf('blog') > -1 || u.indexOf('.pdf') > -1) resourceType = 6;

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
medium.com/
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix AI 3 months ago

To fix the problem, we need to parse the URL and check the host component explicitly against a whitelist of allowed hosts. This approach ensures that the check is accurate and not prone to substring matching issues.

  1. Parse the URL using a URL parsing library to extract the host component.
  2. Define a whitelist of allowed hosts.
  3. Check if the parsed host is in the whitelist.
  4. Update the resource type assignment logic to use this secure check.
Suggested changeset 1
db/seed.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/db/seed.ts b/db/seed.ts
--- a/db/seed.ts
+++ b/db/seed.ts
@@ -304,16 +304,16 @@
 			const t = r.title.toLowerCase();
-			const u = r.url.toLowerCase();
-			if (
-				u.startsWith('https://youtube.com/') ||
-				u.startsWith('http://youtube.com/') ||
-				u.startsWith('https://www.youtube.com/') ||
-				u.startsWith('http://www.youtube.com/') ||
-				u.startsWith('https://youtu.be/') ||
-				u.startsWith('https://vimeo.com/')
-			)
-				resourceType = 4;
-			else if (u.indexOf('udemy.com/') > -1 || u.indexOf('course') > -1) resourceType = 3;
-			else if (u.indexOf('amazon.com/') > -1 || u.indexOf('pdf') > -1 || u.indexOf('book') > -1) resourceType = 2;
-			else if (u.indexOf('github.com/') > -1 || u.indexOf('gitlab.com/') > -1) resourceType = 5;
-			else if (u.indexOf('medium.com/') > -1 || u.indexOf('dev.to/') > -1 || u.indexOf('blog') > -1 || t.indexOf('blog') > -1 || u.indexOf('.pdf') > -1) resourceType = 6;
+			const u = new URL(r.url.toLowerCase());
+			const host = u.host;
+			const allowedHosts = {
+				4: ['youtube.com', 'www.youtube.com', 'youtu.be', 'vimeo.com'],
+				3: ['udemy.com'],
+				2: ['amazon.com'],
+				5: ['github.com', 'gitlab.com'],
+				6: ['medium.com', 'dev.to']
+			};
+			if (allowedHosts[4].includes(host)) resourceType = 4;
+			else if (allowedHosts[3].includes(host) || u.pathname.includes('course')) resourceType = 3;
+			else if (allowedHosts[2].includes(host) || u.pathname.includes('pdf') || u.pathname.includes('book')) resourceType = 2;
+			else if (allowedHosts[5].includes(host)) resourceType = 5;
+			else if (allowedHosts[6].includes(host) || u.pathname.includes('blog') || t.includes('blog') || u.pathname.endsWith('.pdf')) resourceType = 6;
 			r.resource_type_id = !!row.resource_type_id ? row.resource_type_id : resourceType;
EOF
@@ -304,16 +304,16 @@
const t = r.title.toLowerCase();
const u = r.url.toLowerCase();
if (
u.startsWith('https://youtube.com/') ||
u.startsWith('http://youtube.com/') ||
u.startsWith('https://www.youtube.com/') ||
u.startsWith('http://www.youtube.com/') ||
u.startsWith('https://youtu.be/') ||
u.startsWith('https://vimeo.com/')
)
resourceType = 4;
else if (u.indexOf('udemy.com/') > -1 || u.indexOf('course') > -1) resourceType = 3;
else if (u.indexOf('amazon.com/') > -1 || u.indexOf('pdf') > -1 || u.indexOf('book') > -1) resourceType = 2;
else if (u.indexOf('github.com/') > -1 || u.indexOf('gitlab.com/') > -1) resourceType = 5;
else if (u.indexOf('medium.com/') > -1 || u.indexOf('dev.to/') > -1 || u.indexOf('blog') > -1 || t.indexOf('blog') > -1 || u.indexOf('.pdf') > -1) resourceType = 6;
const u = new URL(r.url.toLowerCase());
const host = u.host;
const allowedHosts = {
4: ['youtube.com', 'www.youtube.com', 'youtu.be', 'vimeo.com'],
3: ['udemy.com'],
2: ['amazon.com'],
5: ['github.com', 'gitlab.com'],
6: ['medium.com', 'dev.to']
};
if (allowedHosts[4].includes(host)) resourceType = 4;
else if (allowedHosts[3].includes(host) || u.pathname.includes('course')) resourceType = 3;
else if (allowedHosts[2].includes(host) || u.pathname.includes('pdf') || u.pathname.includes('book')) resourceType = 2;
else if (allowedHosts[5].includes(host)) resourceType = 5;
else if (allowedHosts[6].includes(host) || u.pathname.includes('blog') || t.includes('blog') || u.pathname.endsWith('.pdf')) resourceType = 6;
r.resource_type_id = !!row.resource_type_id ? row.resource_type_id : resourceType;
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Copy link

github-actions bot commented Oct 7, 2024

Your database schema is up-to-date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Text or image-related updates dependencies Pull requests that update a dependency file design A task related to UI design and static templating implementation. enhancement New feature or request javascript Pull requests that update Javascript code ops Operational updates: configs, CI/CD, procedures, project page An entire page layout to create, including static HTML, CSS and templating. refactor Remake an existing functionality to KISS and DRY and make it better.
Projects
Status: 🏗 In progress
1 participant