Skip to content

Latest commit

 

History

History
578 lines (397 loc) · 34 KB

ch02.adoc

File metadata and controls

578 lines (397 loc) · 34 KB

Scripts and Modules

We used the node command briefly in Chapter 1 to explore Node’s REPL mode and execute simple scripts. In this chapter, we’ll learn how Node loads and executes scripts modules. We’ll start by exploring more of the options, arguments, and environment variables that can be used with the node command, and learn more about what we can do in a REPL session. Then we learn about the steps Node takes to load and execute a module.

Node CLI

The node command has many options that can be used to customize its behavior. It also supports arguments and environment variables to further customize what it does, and to pass data from the operating system environment to Node’s process environment.

Let’s take a look. In the terminal, type:

$ node -h | less

This will output the help documentation for the command (one page at a time because we piped the output on the less command). I find it useful to always get myself familiar with the help pages for the commands I use often.

Usage: node [options] [ script.js ] [arguments]
       node inspect [options] [ script.js | host:port ] [arguments]

Options:
  -                           script read from stdin (default if no
                              file name is provided, interactive mode
                              if a tty)
  --                          indicate the end of node options
  --abort-on-uncaught-exception
                              aborting instead of exiting causes a
                              core file to be generated for analysis
  --build-snapshot            Generate a snapshot blob when the
                              process exits. Currently only supported
                              in the node_mksnapshot binary.
  -c, --check                 syntax check script without executing
  --completion-bash           print source-able bash completion
                              script
  -C, --conditions=...        additional user conditions for
                              conditional exports and imports
  --cpu-prof                  Start the V8 CPU profiler on start up,
:

The first two lines specify how to use the node command. Anything in square brackets is optional, which means, according to the first line, that we can use the node command on its own without any options, scripts, or arguments. That’s what we did to start a REPL session. To execute a script, we used the node script.js syntax (script can be any name there).

What’s new here is that there are options and arguments that we can use with the command. Let’s talk about these.

Tip

The second usage line is to start a terminal debugging session for Node. While that’s sometimes useful, in Chapter 4, I’ll show you a much better way to debug code in Node.

In the help page, right after the usage lines, there is a list of all the options that you can use with the command. Most of these options are advanced, but knowing of their existence is a helpful reference. You should scan through this list just to get a quick idea of all the types of things that you can do with the command. Let me highlight a few of the options that I think you should be aware of.

Options and Arguments

The --check option (or -c) lets you check the syntax of a Node script without running that script. An example use of this option is to automate a syntax check before sharing code with others.

The --eval and --print options (or -e and -p) can both be used for executing code directly from the command line. I like the -p one more because it executes and prints (just like in the REPL mode). To use these options, you pass them a string of Node code. For example:

$ node -p "Math.random()"

This is handy, as you can use it to create your own powerful commands (and alias them if you want). For example, say you need a command to generate a unique random string (to be used as a password maybe). You can leverage Node’s crypto module in a short -p one liner:

$ node -p "crypto.randomBytes(16).toString('hex')"

Pretty cool, isn’t it!

Note

Note how the crypto module is available to the -p option without needing to require it (just like in the REPL mode).

How about a command to count the words in any file?! This one will help us understand how to use arguments with the node command:

$ node -p "fs.readFileSync(process.argv[1])
             .toString().split(/\s+/).length" ~/.bashrc

Don’t panic. There’s a lot going on with this one. It leverages the powers of both Node and JavaScript. Go ahead and try it first. You can replace ~/.bashrc with a path to any file on your system.

Let’s decipher this one a bit:

The readFileSync function is part of the built-in node:fs module. It takes a file path as an argument and synchronously returns a binary representation of that file’s data. That’s why I chained a .toString call to it, to get the file’s actual content (in UTF-8). Furthermore, instead of hardcoding the file path in the command, I put the path as the first argument to the node command itself and used process.argv[1] to read the value of that argument (see explanation of that in the next sidebar). This enables us to use the word-counting one-liner with any file. We can alias it (without the path argument) and then use the alias with a path argument as shown in Aliasing a Node print one-liner.

$ alias count-words="node -p 'fs.readFileSync(process.argv[1])
                      .toString().split(/\s+/).length'"

Then once I have the content of the file, I use JavaScript’s split method (which is available on any string) to split the content using the /\s+/ regular expression (which means one or more spaces). This produces an array of words, and we can then count the array items with a .length call to find the number of words.

alias p
Figure 1. Aliasing a Node print one-liner
The process.argv array

We know from the usage syntax that the node command can take arguments. These arguments can be any list of strings and when you specify them, you make them available to the Node process.

The word-counting one-liner used process.argv[1]. The process object is a global scope object, and it simply represents Node’s interface to the actual OS process that executes the node command. The argv property is an array that holds all the arguments you pass to the node command (regardless of how you’re using the command). To understand that, run the following command:

$ node -p "process.argv" hello world

This will output the entire array of arguments, Node uses the first element in that array for the path of the node command itself, then the arguments are listed in order. That’s why in the word-counting one liner, I used the second element of argv.

Note that if you’re executing a script, the path for that script will be the second element of process.argv, and the arguments (if any) will be listed starting with the third element.

The --require option (or -r) allows you to require a module before executing the main script. This is useful if you need to load a specific module before running your code or if you want to set up certain configurations or load some variable values. This one only works with CommonJS modules. For ES Modules, you can use the --import option.

For example, let’s say you have a Node project that requires the use of a module called dotenv, which loads environment variables from a file. Normally, you would need to include something like require('dotenv').config() at the beginning of your main file to use the dotenv module. However, with the -r option, you can load the module automatically without having to add it to any file:

$ node -r dotenv/config index.js
Note

Node supports loading environment variables from a file directly with the --env-file option. We’ll see an example of that shortly.

The --watch option allows you to watch a file (and its dependencies) for changes. It automatically restarts Node when a change is detected. This is very useful in development environments. You can test it with any of the files we wrote so far. For example, to run the basic web server example from Chapter 1 in watch mode, you can run:

$ node --watch index.js

This will start the server in watch mode. Make a change to the server.js file (change the Hello World string, for example) and notice how the node command will automatically restart.

The --test option makes Node look for and execute code that’s written for testing. Node uses a simple naming convention for that. For example, it’ll look for any files named with a .test.js suffix, or files whose names begin with test-.

There are a lot more options, but most of them are for advanced use. It’s good to be aware of them so that in the future, you can look up if there’s one particular option that might make a task you’re doing simpler.

Since Node is a wrapper around V8, and V8 itself has CLI options, the node command accepts many V8 options as well. The list of all the V8 options you can use with the node command can be printed with:

$ node --v8-options | less

This is an even bigger list! You can set JavaScript harmony flags (to turn on/off experimental features), you can set tracing flags, customize the engine memory management, and many other customizations. As with the node command options, it’s good to know that all these options exist.

Environment Variables

Toward the end of the node -h output, you can see a list of environment variables, like NODE_DEBUG, NODE_PATH, and many more. Environment variables are another way to customize the behavior of Node or make custom data available to the Node process (similar to command arguments)

Every time you run the node command, you start an operating system process. In Linux, the command ps can be used to list all running processes. If you run the ps command while a Node process is running (like the basic web server example), one of the listed processes will be Node (and you can see its process ID, and stop it from the terminal if you need to). Here’s a command to output all process details and filter the output for processes that have the word node in them:

$ ps -ef | grep "node"

The process object represents a bridge between the Node environment and the operating system environment. We can use it to exchange information between Node and the operating system. In fact, when you console.log a message, under the hood, the code is basically using the process object to write a string to the operating system stdout (standard output) data stream.

Environment variables are one way to pass information from the operating system environment (used to execute the node command), to the Node environment, and we can read their values using the env property of the process object.

Here’s an example to demonstrate that:

$ NAME="Reader" node -p "'Hello ' + process.env.NAME"

This will output Hello Reader. It sets an environment variable NAME then reads its value with process.env.NAME. You can set multiple environment variables if you need, either directly from the command line like this example, or using the Linux export command prior to executing the node command:

$ export GREETING="Hello"; export NAME="Reader"; \
  node -p "process.env.GREETING + ' ' + process.env.NAME"
node env vars
Figure 2. Using environment variables in Node
Tip

In Linux (and macOS), you can use a semicolon to execute multiple commands on the same line, and \ to split a command into multiple lines.

You can use environment variables to make your code customizable on different machines or environments. For example, the basic web server example in Chapter 1 hard-coded the port to be 3000. However, on a different machine, 3000 might not be available, or you might need to run the server on a different port in a production environment. To do that, you can modify the code to use process.env.PORT ?? 3000 instead of just 3000 (in the listen method) and then run the node command with a custom port when you need to:

$ PORT=4000 node index.js

Note that if you don’t specify a port, the default port would be 3000 because I used the ?? (nullish) operator to specify a value when process.env.port does not have one. This is a common practice.

Note

You can’t use Node’s 'process.env' object to change an operating system environment variable. It’s basically a copy of all the environment variables available to the process.

The list of environment variables shown toward the end of node -h output are Node’s built-in environment variables. These are variables that Node will look for and use if they have values. Here are a few examples:

  • NODE_PATH can be used to simplify import statements by using absolute paths instead of relative ones.

  • NODE_OPTIONS is an alternative way to specify the options Node supports instead of passing them to the command line each time.

  • NODE_DEBUG can be used to tell Node to output more debugging information when it uses certain libraries. We give it a comma-separated list of modules to debug, for example, with NODE_DEBUG=fs,http, Node will start outputting debugging messages when the code uses either the node:fs or node:http modules. Many packages support this environment variable.

node debug http
Figure 3. Using NODE_DEBUG with http

You can also put all the environment variables you need to set in a file (like a .env file for example), and then instruct Node to include all of the values defined in that file in the process.env object, using the --env-file option of the node command. For example, if you have the following .env file:

Example .env file
PORT=3000
NODE_DEBUG=fs,http

You can execute a Node script with these environment variables set using the command:

$ node --env-file=.env script.js
Tip

You can use multiple environment files if you need to.

Node’s REPL Mode

In Node’s REPL mode, as we learned in Chapter 1, you can type any JavaScript code, and Node will execute it and automatically print its result. This is a convenient way to quickly test short JavaScript expressions (and it works for bigger code too). There are a few other helpful things you can do in REPL mode beyond the quick tests.

In REPL mode, you usually type an expression (for example: 0.1 + 0.2), and hit Enter to see its result. You can also type statements that are not expressions (for example: let v = 21;) and when you hit Enter, the variable v will be defined, and the REPL mode will print undefined since that statement does not evaluate to anything. If you need to clear the screen, you can do so with CTRL+L.

If you try to define a function, you can write the first line and hit Enter, and the REPL mode will detect that your line is not complete, and it will go into a multiline mode so that you can complete it. Try and define a small function to test that.

repl multiline
Figure 4. Node REPL multiline mode

The REPL multiline mode is limited but there’s an integrated basic editor available within REPL sessions as well. While in a REPL session, type .editor to start the basic editor mode, then you can type as many lines of code as you need, you can define multiple functions, or paste code from the clipboard, then, when you are done, hit CTRL+D to have Node execute all the code you typed in the editor.

The .editor command is one of many REPL commands which you can see by typing the .help command:

> .help
.break    Sometimes you get stuck, this gets you out
.clear    Alias for .break
.editor   Enter editor mode
.exit     Exit the REPL
.help     Print this help message
.load     Load JS from a file into the REPL session
.save     Save all evaluated commands in this REPL session to a file

Press Ctrl+C to abort current expression, Ctrl+D to exit the REPL

The .break command lets you get out of weird cases in REPL sessions. For example, when you paste some code in Node’s multiline mode and you are not sure how many curly braces you need to get to an executable state. You can completely discard your pasted code by using a .break command (or pressing Ctrl+C once). This saves you from killing the whole session to get yourself out of situations like these.

The .exit command exits the REPL session (just like Ctrl+D).

The .save command enables you to save all the code you typed in one REPL session into a file. The .load command enables you to load JavaScript code from a file and make it all available within the REPL session. Both of these commands take a file name as an argument.

One of my favorite things about Node’s REPL mode is how I can inspect basically everything that’s available natively in Node without needing to require them. All the built-in modules (like node:fs, node:http, etc) are preloaded in a REPL session and you can use the TAB key to inspect their APIs.

Just like in a terminal or editor, hitting the TAB key once in a REPL session will attempt to auto-complete anything you partially type. Try typing cr and hit TAB to see it get auto-completed to crypto. Hitting the TAB key twice can be used to see a list of all the possible things you can type from whatever partially-typed text you have. For example, type a and hit TAB twice to see all the available global scope objects that begin with a.

node repl autocomplete
Figure 5. Node REPL Auto-complete

This is great if you need to type less and avoid typing mistakes, but it gets better. You can use the TAB key to inspect the methods and properties available on any object. For example, type Array. and hit TAB twice to see all the methods and properties that you can use with the JavaScript Array class. This works with Node modules as well. Try it with fs. or http..

It even works with objects that you create. For example, create an empty array using let myArr = [];, then type myArr. and hit TAB twice to see all the methods available on an array instance.

node repl methods
Figure 6. Exploring methods with auto-complete

TAB discoverability works on the global level too, if you hit TAB twice on an empty line, you get a list of everything that is globally available.

node repl global
Figure 7. Hitting TAB twice on an empty line

This is a big list, but it’s a useful one, it has all the globals in the JavaScript language itself (like Array, Number`, Math, etc), and it has all the globals from Node (like process, 'setTimeout', etc), and it also lists all the core modules that are available natively in Node (like node:fs, node:http, etc).

Tip

In the list of all global things, you’ll notice an underscore character . This is a handy REPL session variable that stores the value of the last evaluated expression. For example, after executing a Math.random() line, you can type to access that same random value. You can even use it in any place where you use a JavaScript expression. Try let random = _;.

You can use the node:repl module to create your own custom REPL server. You can customize many things like the prompt, the input and output streams, whether to use colors or not, and a few more options. You can also attach your own global context objects to it.

Here’s a custom REPL example that’ll start a REPL session with a different prompt, in strict mode, and it’ll not output the return value if it’s undefined. It’ll also make the lodash library available globally in your custom RELP sessions:

import { start, REPL_MODE_STRICT } from 'repl';
import lodash from 'lodash';

const replServer = start({
  prompt: '... ',
  ignoreUndefined: true,
  replMode: REPL_MODE_STRICT,
});

replServer.context.lodash = lodash;
node custom repl server
Figure 8. Using a custom REPL server

Node Modules

The word module means a reusable piece of code. Something you can include and use in any application, as many times as you need.

In Node, the word script is usually used for a piece of code that’s executed once with the node command. Any other files or folders that are required or imported are what’s referred to as modules.

When you specify a module as a dependency, Node goes through a few key steps to complete the module loading process: resolution and reading of the module contents, isolating the module scope, executing the module code, and caching the module.

Module Resolution

Node uses the following procedure to determine how to find a module that is being imported.

If the module name does not start with a . (denoting a relative path) or a / (denoting an absolute path), Node will first check if the module is a built-in one. If it is, it’ll load and execute it directly.

If the module is not a built-in one, Node will look for it under node_modules folders starting from the location where the importing module is, and going up in the folders hierarchy. For example, if the importing module is in /User/samer/efficient-node/src, Node will first look under src for a node_modules folder, if it does not find one, it’ll look next under efficient-node, and so on all the way to the root path.

You can use this lookup procedure to localize modules dependencies by having multiple node_modules folders in your project, but that generally increases the complexity of the project. You can also use this lookup procedure to have multiple projects share a node_modules folder by placing that folder in a parent folder common to all projects, or even have a global node_module folder for all projects on one server. While this might be useful in some cases, having a single node_modules folder per project is the standard and recommended practice.

If the imported module starts with a . or /, Node will look for it in the relative or absolute folder specified by the path.

Tip

For CommonJS modules, if you set the NODE_PATH environment variable before executing a script. Node will first look for any required modules in the paths specified by NODE_PATH (which can be a single path, or multiple paths separated by a comma). This can be useful to use short absolute paths instead of confusing relative ones.

If you need to only resolve the module and not execute it, you can use the require.resolve() function for CommonJS modules, or the import.meta.resolve() function for ES modules. These functions do not load the module. They just verify that it exists and will throw an error if it does not.

Module Loading

Once the path of a module is resolved successfully, Node will read the content of the module and determine its type.

A module can be a CommonJS module or an ES module. Supported file extensions are .js, .cjs, .mjs. It can be a single file or a directory with a package.json that specifies what files in the directory can be imported.

A module can also be a JSON file (.json extension). When you import a JSON file, you get a JavaScript object representing the data in that JSON file.

// In CommonJS modules:
const data = require('./file.json');

// In ES modules with static import:
import data from './file.json'
  with { type : 'json' };

// In ES modules with dynamic import:
const { default: data } = await import('./file.json', {
  with: { type: 'json' },
});
Tip

The with keyword in this example is used to specify the type import attribute. The import attributes feature gives the runtime instructions on how a module should be loaded. It’s a security standard to prevent executing malicious code. It can be used with other module types as well (for example, a "css" module in a browser).

A module can also be a Node addon compiled file. Node addons are dynamically-linked objects implemented in a low-level language like C or C++ and compiled to be loaded as ordinary Node modules. Node has an API known as NODE-API that’s dedicated to building native addons. It’s independent from the underlying JavaScript runtime. If you need a module with high performance, or you need it to access system resources or integrate with C/C++ libraries, you can use Node-API to build an addon and use it as you would use any other built-in Node module.

Warning

Addons are not supported with ES module imports. They can instead be loaded using the module.createRequire() function.

Module Scoping

JavaScript functions can be called with any number of arguments. The arguments keyword can be used to access the list of all arguments a function is called with.

print arguments
Figure 9. The implicit arguments object
Tip

If you do need to have a function with a dynamic number of arguments, you should use explicit rest parameters instead of the implicit arguments keyword.

Node wraps all CommonJS modules with a function to give them a private scope. That wrapping function is called with five implicit arguments. To see that in action, print the value of the arguments keyword in the top-level scope of a CommonJS module.

module arguments
Figure 10. CommonJS module wrapping

These five implicit arguments are (in order): exports, require, module, filename, and dirname. When you use these within a CommonJS module, you are not using a global variable, you’re using an argument from the implicit wrapping function.

The exports, require, and module arguments are Node’s way to manage a CommonJS module’s API and its dependencies. The filename value has the full path of the module file. The dirname value has the path to the directory where the module file is located.

Similar to CommonJS module wrapping, ES modules are executed in an implicit scope but there is no wrapping function and the five implicit arguments are not defined at all. Instead, an ES module API and dependencies are managed with import/export statements.

If you need to access the file name or directory name of an ES module, you can use import.meta.filename and import.meta.dirname.

With this scoping in modules, all the variables you define in a module are local to that module. If you need to define a global variable, you can use the global scope object globalThis. Any properties you add to that object become global variables. It’s good to know that you can do that but you should avoid using global variables as they can be problematic for many reasons.

Module Execution

This is the step where Node will execute the code in a module and finalize its dependencies and exports.

One common coding practice is to put any configurable variables that are used to seed or run an application into their own modules. An example of such configurable variables are the PORT and HOST on which a web server will run.

Let’s create a config.cjs file to host these 2 configurable variables. The .cjs extension makes it a CommonJS module that will be wrapped for scoping. This module will have the five implicit arguments.

Note

I’ll provide the equivalent ES module syntax below the CommonJS module syntax. You can use the .mjs extension if you want to test the examples with ES modules.

The exports argument will start out as an empty object.

exports argument
Figure 11. The exports argument

To define the API of the config.cjs module, we just define properties on the exports object. Properties can be static values of any other type of object in JavaScript (like a function, a class, or a promise).

console.log('Loading config.cjs');

exports.PORT = process.env.PORT ?? 3000;
exports.HOST = process.env.HOST ?? 'localhost';
exports.SERVER_URL = (
  protocol = process.env.PROTOCOL ?? 'http',
) => `${protocol}://${exports.host}:${exports.PORT}`;
// In ES modules
export const PORT = process.env.PORT ?? 3000;
export const HOST = process.env.HOST ?? 'localhost';
export const SERVER_URL = (
  protocol = process.env.PROTOCOL ?? 'http',
) => `${protocol}://${exports.host}:${exports.PORT}`;

Note how I used process.env variables to make the configurations customizable on different environments. I also made SERVER_URL a function that receives a protocol argument, which is customizable through the environment as well. Making a configuration value a function allows it to be customizable at run time.

When we require this config.cjs module in another module, the require function call returns the exports object. Let’s test that in an index.cjs file:

const config = require("./config.cjs");
console.log(config);

// Or we can use destructuring
// const { PORT, HOST } = require("./config.cjs");
// In ES modules
import config from "./config.mjs";
console.log(config);

// Or we can use named imports
// import { PORT, HOST } from "./config.cjs";
require call
Figure 12. Requiring a CommonJS module

Now we can say that the index.cjs module depends on the config.cjs module. This is where the term dependency management comes from. We are managing the dependencies of a module here and bringing one module’s API to use in another module.

The exports argument in CommonJS modules is actually an alias to module.exports. The latter is what’s returned when we invoke the require function. In some cases, you might need the top-level API object to be a function or a class, or anything else that’s not a simple aliased object. In these cases, you’ll need to change the value of module.exports itself to define your special API.

For example, let’s say that we want all the configuration properties to be the result of executing a function rather than a direct object. This might be helpful for testing as we can mock the configuration function differently for different tests. To make the top-level API object a function, you need to use module.exports. Here’s an example of how we can do that for config.cjs:

module.exports = () => {
  return {
    PORT: process.env.PORT ?? 3000,
    HOST: process.env.HOST ?? 'localhost',
    SERVER_URL: (protocol = process.env.PROTOCOL ?? 'http') =>
      `${protocol}://${exports.host}:${exports.PORT}`,
  };
};
// In ES modules
export default () => {
  return {
    PORT: process.env.PORT ?? 3000,
    HOST: process.env.HOST ?? 'localhost',
    SERVER_URL: (protocol = process.env.PROTOCOL ?? 'http') =>
      `${protocol}://${exports.host}:${exports.PORT}`,
  };
};

With that, to use the configuration value in index.cjs, we’ll need to invoke what we get from the require function:

const config = require('./config.cjs');

console.log(
  config(), // Note how we are invoking this
);
// In ES Modules:
import config from './config.mjs';

console.log(
  config(), // Note how we are invoking this
);

This method is often helpful when you need to use the dependency injection design pattern, which is when some modules are injected into other modules to create more flexibility and make modules more reusable.

If you need to make a Node module executable from the CLI as a script, you can use the require.main property to check if the module is being run directly. The require.main value will equal the module argument in that case. The following figure has an example of a simple module using that check to determine what to do.

require main
Figure 13. The require.main check
Warning

ES modules have no equivalent simple check to determine if they are run directly, but import.meta.url can be used along with process.argv to do a similar check. The es-main npm package has a good implementation of that.

Module Caching

To understand another concept about how Node modules work, let’s repeat the require line in index.cjs multiple times:

require('./config.cjs');
require('./config.cjs');
require('./config.cjs');

Given these three require lines, when we execute index.cjs, how many times will the "Loading config.cjs" line in config.cjs be outputted?

The answer is not three times. It’ll only be outputted once.

module caching
Figure 14. Node module caching

Both CommonJS modules and ES modules in Node are cached after the first call. A module is executed the first time you require or import it, then when you import it again, Node loads it up from a cache.

If you look at front-end applications, like React for example, all component files import the React module, and that’s okay, because only the first import will do the work, the rest will use the cache.

But what if you do want the console.log message to show up every time we require config.cjs?

You can make the top export of config.cjs a function instead of an object, put all the code there inside the function, and call that function every time you need the code to be executed. The cache, in that case, will cache the definition of the function.

Summary

Node CLI has many powerful options that we can control. We can pass arguments to it and set environment variables before running it. Both of these options allow us to pass data from the operating system environment to a running Node process. Node’s process object is the bridge.

Node’s REPL mode is a good way to test simple expressions, explore everything you can use in Node, and take a quick look at the API of anything, including core modules, installed modules, and even objects you instantiate.

CommonJS Modules in Node are implicitly wrapped in a function and are passed five arguments. ES modules have a private scope as well.

We use the exports object in CommonJS modules, or export statements in ES modules to define the API of a module. Modules that need to depend on other modules use the require function or import statements to access a dependency API.

Node manages a cache for all modules. To discover where a module is, Node follows a predefined set of rules depending on the path of the module. A path can be a relative one, an absolute one, or just a name. For the latter case, Node looks for the module in node_modules folders.

In the next chapter, we’ll do a deep dive into how Node handles asynchronous operations and learn about the event-driven nature of Node modules.