Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 01-regular-expressions.md #184

Merged
merged 2 commits into from
Apr 28, 2023

Conversation

lyndamk
Copy link
Contributor

@lyndamk lyndamk commented Mar 1, 2021

The two paragraphs below might be hard for a novice to digest. My suggestions:

  • Move the example of the phone numbers to the beginning of the first paragraph to give the idea more concreteness.

  • Provide an example of a regex for finding those phone numbers. This seems necessary because they don't see actual regexs until the exercises.

  • Define a literal character and give an example using the phone number example.

  • Give example of a metacharacter using the phone numbers.

  • The example of the escape in the second paragraph is understandable if you know regex. Is this necessary here or could it come after some exercises? With an example and exercise?

Regular expressions rely on the use of literal characters (example) and metacharacters (example). A metacharacter is any American Standard Code for Information Interchange (ASCII) character that has a special meaning. By using metacharacters and possibly literal characters, you can construct a regex for finding strings or files that match a pattern rather than a specific string. For example, say your organization wants to change the way they display telephone numbers on their website by removing the parentheses around the area code. Rather than search for each specific phone number (that could take forever and be prone to error) or searching for every open parenthesis character (could also take forever and return many false-positives), you could search for the pattern of a phone number.

Since regular expressions defines some ASCII characters as "metacharacters" that have more than their literal meaning, it is also important to be able to "escape" these metacharacters to use them for their normal, literal meaning. For example, the period . means "match any character", but if you want to match a period then you will need to use a \ in front of it to signal to the regular expression processor that you want to use the period as a plain old period and not a metacharacter. That notation is called "escaping" the special character. The concept of "escaping" special characters is shared across a variety of computational settings, including markdown and Hypertext Markup Language (HTML).

The two paragraphs below might be hard for a novice to digest. My suggestions:
- Move the example of the phone numbers to the beginning of the first paragraph to give the idea more concreteness.
- Provide an example of a regex for finding those phone numbers. This seems necessary because they don't see actual regexs until the exercises.
- Define a literal character and give an example using the phone number example.
- Give example of a metacharacter using the phone numbers.

- The example of the escape in the second paragraph is understandable if you know regex. Is this necessary here or could it come after some exercises? With an example and exercise?

Regular expressions rely on the use of literal characters (example) and metacharacters (example). A metacharacter is any American Standard Code for Information Interchange (ASCII) character that has a special meaning. By using metacharacters and possibly literal characters, you can construct a regex for finding strings or files that match a pattern rather than a specific string. For example, say your organization wants to change the way they display telephone numbers on their website by removing the parentheses around the area code. Rather than search for each specific phone number (that could take forever and be prone to error) or searching for every open parenthesis character (could also take forever and return many false-positives), you could search for the pattern of a phone number. 

Since regular expressions defines some ASCII characters as "metacharacters" that have more than their literal meaning, it is also important to be able to "escape" these metacharacters to use them for their normal, literal meaning. For example, the period `.` means "match any character", but if you want to match a period then you will need to use a `\` in front of it to signal to the regular expression processor that you want to use the period as a plain old period and not a metacharacter. That notation is called "escaping" the special character. The concept of "escaping" special characters is shared across a variety of computational settings, including markdown and Hypertext Markup Language (HTML).
@yoyology
Copy link

I also feel that the initial paragraph of the lesson is difficult to understand. As it stands, the text reads

Regular expressions are a concept and an implementation used in many different programming environments for sophisticated pattern matching. They are an incredibly powerful tool that can amplify your capacity to find, manage, and transform data and files.

A regular expression, often abbreviated to regex, is a method of using a sequence of characters to define a search to match strings, i.e. “find and replace”-like operations. In computation, a ‘string’ is a contiguous sequence of symbols or values. For example, a word, a date, a set of numbers (e.g., a phone number), or an alphanumeric value (e.g., an identifier). A string could be any length, ranging from empty (zero characters) to one that spans many lines of text (including line break characters). The terms ‘string’ and ‘line’ are sometimes used interchangeably, even when they are not strictly the same thing.

I would recommend the following:

Many different programming environments require a way to match patterns of characters to do things like ensuring that an e-mail address is properly entered into an online form. A common tool for this purpose is regular expressions. Using regular expressions (or regex for short) allows you to amplify your capacity to find, manage, and transform data and files.

A regular expression is a method of using a sequence of characters to define a search to match strings, i.e. “find and replace”-like operations. In computation, a ‘string’ is a contiguous sequence of symbols or values. For example, a word, a date, a set of numbers (e.g., a phone number), or an alphanumeric value (e.g., an identifier). A string could be any length, ranging from empty (zero characters) to one that spans many lines of text (including line break characters). The terms ‘string’ and ‘line’ are sometimes used interchangeably, even when they are not strictly the same thing.

The only change to the second paragraph is to remove the reference to abbreviation, since I've moved that to the first paragraph.

Removing the placeholder for an example; this is flagged as an issue for a future update.
@sharilaster
Copy link
Contributor

Thank you @lyndamk and @yoyology for the excellent suggestions -- and my apologies it's taken so long to address them. I've removed the placeholder for an example, and will confirm this is open in an issue. And, the suggested revisions to the lesson introduction are now open as a separate issue (#207) so it should be fairly straightforward to create a new PR with the updated language, once the migration to the Carpentries workbench is complete.

@sharilaster
Copy link
Contributor

Confirmed -- the need for an example is open in #187.

zkamvar pushed a commit that referenced this pull request May 3, 2023
Update 01-regular-expressions.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants