You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HEK: the Heliophysics Events Knowledgebase. it contains the following two components
HER: Heliophysics Events Registry
database of events that occurred on the sun, such a flare
HCR: Heliophysics Coverage Registry
database of operational sequences
records that a telescope was looking at the sun's north polar region at a particular time on a particular day
There are a handful of core website and documentation pages, as well as a few search pages I haven't fully differentiated between. The rest of the pages are either events or filtered event lists.
Supposedly, isolsearch is the main ui tool that users use to search the hek, however there are other search interfaces on the site.
Lockheed and Stanford both have involvement in this project, and there may or may not be relevant stuff hosted on their websites. For example the HER API documentation is located on Stanford's website.
There is a python package called sunpy which lets users interface with the HEK api. It is also hosted on its own page.
URL Stats
Here is the spreadsheet that lists all 80,519 urls we have ever found on the website and contains a preliminary assessment of each url.
Raw Counts
total unique: 80,519
new 13,390
original 67,131
Deltas
missing from new: 67,128
missing from original: 13,389
Discoveries
Multiplying Urls in original connector
For a single event, the original connector would actually list 3 urls:
The differences shown below are between the sde and smd folders, and not necessarily between the two document collections. These differences certainly wouldn't account for the url discrepancies we are seeing.
title mapping moved to top
doc type mapping moved to top
shard index removed
Questionable Settings
https://www.lmsal.com/hek/her?cmd=home is directly included. However, it is also directly excluded. The exclude lacks a star at the end, so this might just exclude the root and be intentional.
it will be important that whatever page this is contains all the right stuff. the movie file link, the comments from annotations, event description, etc
i think we probably want this page, imagine a user searching for recent solar events. however, we do not want to index the individual event links on this page, as they would be duplicates of the complete event listing
these are 217 thousand of the most downloaded events, filterable by instrument. maybe we get the main page, but i definitely don't think we should follow each link
What do we probably NOT want?
random queries, or example queries, such as those found on the main HEK page:
stuff like this is a next page listing from the recent events, and we probably don't need it, since it is only important to point users to the existence of a recent events page, not exhaustively show everything considered to be a recent event, separate from the main listing of all the events
it should be noted that this offset goes on to huge numbers; i tried 10,000 and was still getting results. i'm guessing it's actually all events ordered by date?
these are observations planned to occur in the next 24 hours. weirdly, it says there are 184 THOUSAND of them. idk if this is accurate or not, but either way, this page will be irrelevant a day after it's indexed, so we would definitely need to think about how or even whether, we would index it
Some Basics
HEK: the Heliophysics Events Knowledgebase. it contains the following two components
There are a handful of core website and documentation pages, as well as a few search pages I haven't fully differentiated between. The rest of the pages are either events or filtered event lists.
Supposedly, isolsearch is the main ui tool that users use to search the hek, however there are other search interfaces on the site.
Lockheed and Stanford both have involvement in this project, and there may or may not be relevant stuff hosted on their websites. For example the HER API documentation is located on Stanford's website.
There is a python package called sunpy which lets users interface with the HEK api. It is also hosted on its own page.
URL Stats
Here is the spreadsheet that lists all 80,519 urls we have ever found on the website and contains a preliminary assessment of each url.
Discoveries
Multiplying Urls in original connector
Connector Differences
The differences shown below are between the sde and smd folders, and not necessarily between the two document collections. These differences certainly wouldn't account for the url discrepancies we are seeing.
Questionable Settings
<SelectionQuery>title <> 'Heliophysics Event Registry'</SelectionQuery>
doing?Site Analysis
What do we definitely want?
What do we possibly want?
What do we probably NOT want?
Lingering Questions
The text was updated successfully, but these errors were encountered: