-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run program to check for misplaced headers/manglacharan #1889
Comments
Thank you for finding this issue. The line https://github.com/shabados/database/blob/next/data/Sri%20Guru%20Granth%20Sahib%20Ji/0829.json#L396 The line above this shabad is NEKY, but it should be inside 7ZA and preceding 8LK7 |
Program used to check (can be modified for more title matches, can't be sure I found them all) import json
import os
from gurmukhiutils.unicode import unicode
DATA_DIR = "data/"
folders = [
"Sri Guru Granth Sahib Ji",
"Sri Dasam Granth",
"Sarabloh Granth",
"Vaaran Bhai Gurdas Ji",
"Kabit Savaiye Bhai Gurdas Ji",
"Ganj Nama Bhai Nand Lal Ji",
"Ghazals Bhai Nand Lal Ji",
"Jot Bigas Bhai Nand Lal Ji",
"Zindagi Nama Bhai Nand Lal Ji",
]
titlesExact = [
"ਜਾਪੁ ॥",
"ਸਲੋਕੁ ॥",
"ਚਾਚਰੀ ਛੰਦ ॥",
"ਏਕ ਅਛਰੀ ਛੰਦ ॥",
"ਸ੍ਵੈਯਾ ॥",
"ਦੋਹਰਾ ॥",
"ਰਹਰਾਸਿ ਸਾਹਿਬ",
"ਸੋਰਠਾ ।",
]
titlesFuzzy = [
"ੴ",
"॥ ਜਪੁ ॥",
"ਪਾਤਿਸਾਹੀ ੧੦",
"ਸ੍ਰੀ ਭਗਉਤੀ ਜੀ ਸਹਾਇ",
"ਵਾਰ ਸ੍ਰੀ ਭਗਉਤੀ ਜੀ ਕੀ",
"॥ ਤ੍ਵ ਪ੍ਰਸਾਦਿ ॥",
"॥ ਤ੍ਵ ਪ੍ਰਸਾਦਿ ਕਥਤੇ ॥",
"ਭੁਜੰਗ ਪ੍ਰਯਾਤ ਛੰਦ ॥",
"॥ ਚਾਚਰੀ ਛੰਦ ॥",
"॥ ਸਵੱਯੇ ॥",
"॥ ਚੌਪਈ ॥",
"ਵਾਹਿਗੁਰੂ ਜੀਓ ਹਾਜ਼ਰ ਨਾਜ਼ਰ ਹੈ",
"ਮਹਲਾ ੧",
"ਮਹਲਾ ੨",
"ਮਹਲਾ ੩",
"ਮਹਲਾ ੪",
"ਮਹਲਾ ੫",
"ਮਹਲਾ ੬",
"ਮਹਲਾ ੭",
"ਮਹਲਾ ੮",
"ਮਹਲਾ ੯",
"ਮਃ ੧",
"ਮਃ ੨",
"ਮਃ ੩",
"ਮਃ ੪",
"ਮਃ ੫",
"ਮਃ ੬",
"ਮਃ ੭",
"ਮਃ ੮",
"ਮਃ ੯",
"ਵਾਰ ੧",
"ਵਾਰ ੨",
"ਵਾਰ ੩",
"ਵਾਰ ੪",
"ਵਾਰ ੫",
"ਵਾਰ ੬",
"ਵਾਰ ੭",
"ਵਾਰ ੮",
"ਵਾਰ ੯",
]
def isTitle(input):
if input in titlesExact:
return True
for title in titlesFuzzy:
if title in input:
return True
return False
for folder in folders:
# print folder name
print("\n")
print(folder)
folder = folder + "/"
# for each json file in folder
for file in sorted(os.listdir(DATA_DIR + folder)):
if file.endswith(".json"):
with open(DATA_DIR + folder + file, "r") as f:
data = json.load(f)
# check last line of each shabad to see if it's a title/manglacharan
for shabad in data:
lastLine = shabad["lines"][len(shabad["lines"]) - 1]
line_unicode = unicode(lastLine["gurmukhi"], "Sant Lipi")
if isTitle(line_unicode):
print(shabad["id"], lastLine["id"], line_unicode) Output below contains lines that should be moved to the next shabad ID Sri Guru Granth Sahib Ji
FRC QKH1 ਰਾਗੁ ਆਸਾ ਮਹਲਾ ੩ ਪਟੀ ॥
BBH NEKY ਬਿਲਾਵਲੁ ਮਹਲਾ ੫ ॥
D89 4B1H ਰਾਮਕਲੀ ਮਹਲਾ ੫ ਛੰਤ ॥
5YB 8H18 ਮਾਰੂ ਮਹਲਾ ੧ ॥
LSM 5AWP ਸਾਰਗ ਮਹਲਾ ੫ ॥
B0U E1D5 ਸਾਰਗ ਮਹਲਾ ੫ ॥
K84 GJXW ਸਾਰਗ ਮਹਲਾ ੫ ॥
Sri Dasam Granth
UZZ QPYH ਸ੍ਵੈਯਾ ॥
Sarabloh Granth
Vaaran Bhai Gurdas Ji
Kabit Savaiye Bhai Gurdas Ji
Ganj Nama Bhai Nand Lal Ji
Ghazals Bhai Nand Lal Ji
Jot Bigas Bhai Nand Lal Ji
Zindagi Nama Bhai Nand Lal Ji
|
Very easy script to modify, so here is a list of Sirlekh types that are not the first two lines of a shabad (note not all these are wrong, some sirlekhs are 3 lines or more, so you'd have to check all these):
and here's a list for sirlekhs that show up after the 4th line of a shabad (probably more likely to require action):
|
Please note that this list looking at Sirlekhs is not matching the title matcher method. So some lines need to have their type updated as well. |
And just for good measure, decided to run the title matcher against lines that are not the first 2 of a shabad:
And then not first 2:
|
Describe the bug
Environment
No response
The text was updated successfully, but these errors were encountered: