Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Urls Analysis - Helio_Events_Knowledgebase_Website #557

Closed
Tracked by #556
CarsonDavis opened this issue Jan 9, 2024 · 0 comments
Closed
Tracked by #556

Missing Urls Analysis - Helio_Events_Knowledgebase_Website #557

CarsonDavis opened this issue Jan 9, 2024 · 0 comments
Assignees

Comments

@CarsonDavis
Copy link
Collaborator

CarsonDavis commented Jan 9, 2024

Some Basics

HEK: the Heliophysics Events Knowledgebase. it contains the following two components

  • HER: Heliophysics Events Registry
    • database of events that occurred on the sun, such a flare
  • HCR: Heliophysics Coverage Registry
    • database of operational sequences
    • records that a telescope was looking at the sun's north polar region at a particular time on a particular day

There are a handful of core website and documentation pages, as well as a few search pages I haven't fully differentiated between. The rest of the pages are either events or filtered event lists.

Supposedly, isolsearch is the main ui tool that users use to search the hek, however there are other search interfaces on the site.

Lockheed and Stanford both have involvement in this project, and there may or may not be relevant stuff hosted on their websites. For example the HER API documentation is located on Stanford's website.

There is a python package called sunpy which lets users interface with the HEK api. It is also hosted on its own page.

URL Stats

Here is the spreadsheet that lists all 80,519 urls we have ever found on the website and contains a preliminary assessment of each url.

  • Raw Counts
    • total unique: 80,519
    • new 13,390
    • original 67,131
  • Deltas
    • missing from new: 67,128
    • missing from original: 13,389

Discoveries

Multiplying Urls in original connector

Connector Differences

The differences shown below are between the sde and smd folders, and not necessarily between the two document collections. These differences certainly wouldn't account for the url discrepancies we are seeing.

  • title mapping moved to top
  • doc type mapping moved to top
  • shard index removed

Questionable Settings

Site Analysis

What do we definitely want?

What do we possibly want?

What do we probably NOT want?

Lingering Questions

@CarsonDavis CarsonDavis self-assigned this Jan 9, 2024
@CarsonDavis CarsonDavis changed the title Missing Urls - Helio_Events_Knowledgebase_Website Missing Urls Analysis - Helio_Events_Knowledgebase_Website Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant