Skip to content
Gabx edited this page Jul 11, 2016 · 6 revisions

PostgreSQL

PostgreSQL is a relational database management system (RDBMS). That means it is a system for managing data stored in relations. Relation is essentially a mathematical term for table. The notion of storing data in tables is so commonplace today that it might seem inherently obvious, but there are a number of other ways of organizing databases. Files and directories on Unix-like operating systems form an example of a hierarchical database. A more modern development is the object-oriented database.

Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific data type. Whereas columns have a fixed order in each row, it is important to remember that SQL does not guarantee the order of the rows within the table in any way (although they can be explicitly sorted for display).

Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster.

PostgreSQL vs MySQL:

  • MySQL has the most bizarre security leaks and defaults (you need to install a package called mysql_secure_installation to...well...secure MySQL).
  • Feed MySQL with a million records and it will die a horrible, resource eating death.
  • Try rebooting a crashed database (after its horrible death), you will love the way your tables have "crashed and burned".
  • Data integrity and reliability...MySQL will give you a looooong blank stare.

NOTE: There is a postgresql Red Hat familly repo which can be very useful to solve upgrade issues. It will allow you to download previous major versions.

Initialize DB

Postgresql default data directory is usually located in /var/lib/pgsql. This has been replaced by /db/pgsql, place for all of our Data bases.

postgres user has to be created first.

Before PostgreSQL can function correctly, the database cluster must be initialized:

# mkdir -p /db/pgsql
# chown -R postgres:postgres /db/pgsql
% sudo -i -u postgres
-bash-4.3$ initdb -D /db/postgres/data
..............
Success. You can now start the database server using:
    postgres -D /db/postgres/data

NOTES :

  • postgresql superuser is name postgres.
  • postgres user is not member of wheel group and can not sudo.
  • the command will fill the /db/pgsql directory with a data and backups directories.
  • it is possible to initialize as many data bases if using another location (e.g /db/pgsql/data1)
  • to avoid typing password for user postgres at each command from unix shell, a ~/.pgpass has been created. This file accepts multi lines in this format:
hostname:port:database1:username:password
hostname:port:database2:username:password

Upgrade

Upgrading minor versions requires some extra maintenance. Some commands have to be run manually to be able to use the old database with the new major postgresql version. It is thus strongly advise to manually do the upgrade with pg_upgrade and not let dnf do it.

Configuration

Check that permissions are correctly set:

# ls -al /db/pgsql
drwx------ 1 postgres postgres   64 May 31 13:48 .
drwxrwxr-x 1 root     root       36 May 31 13:11 ..
drwx------ 1 postgres postgres   40 May 31 13:48 backups
drwx------ 1 postgres postgres  502 May 31 13:48 data
-rw------- 1 postgres postgres 1348 May 31 13:48 initdb_postgresql.log

Configuration files

The postgresql server is using two main configuration files

/db/pgsql/data/postgresql.conf
/db/pgsql/data/pg_hba.conf

NOTES:

  • custom settings in posthresql.conf has been made following this site. More setting details below this page.
  • host-based authentication is configured in pg_hba.conf. This file controls which hosts are allowed to connect.
  • postgresql socket location: /run/postgresql/.s.PGSQL.5432=. Permission is set to 0777
  • the location of postgresql data is a system variable called PGDATA. This variable is exported by /etc/profile.d/custom-profile.sh
  • after each modification of one of these files, you need to restart posgresql:
# systemctl restart postgresql

  • Modify. The default is:
/db/pgsql/data/pg_hba.conf
-----------------------------
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     md5
# IPv4 local connections:
host    all             all             127.0.0.1/32            md5
# IPv6 local connections:
 host    all             all             ::1/128                 md5

But, this would enable all databases to be connected to on the tcp/ip protocol. This could create a possible backdoor. So to tighten things up we need to set the database user and database name in the file.

Example:

/db/pgsql/data/pg_hba.conf
-----------------------------
# TYPE  DATABASE        USER            ADDRESS                 METHOD

local     postfix      mailreader                                md5

group and role

group

A enlgroup has been created with these privileges:

  • create DB
  • create Role
  • login

role

Role mediawiki has been created and is member of group enlglobal

List roles and attributes:

postgres=# \du
                              List of roles
 Role name |                   Attributes                   |  Member of
-----------+------------------------------------------------+-------------
 enlglobal | Create role, Create DB                         | {}
 mediawiki |                                                | {enlglobal}
 postgres  | Superuser, Create role, Create DB, Replication | {}

data base directory management

Switch user to interact with postgres. The user must be listed as a role in postgresql data base.

% sudo -i -u MyUser
Password: XXXXXX
-bash-4.3$ psql
Password:XXXXXX
psql (9.4.1)
Type "help" for help.
MyUser=# 

Troubleshootings

journalctl -u postgresql

Logs:

# cat /storage/log/postgresql/postgresql-XXX.log

Start postgresql

# systemctl start postgresql

Systemd unit file:

/usr/lib/systemd/system/postgresql.service
-------------------------------------------------
# It's not recommended to modify this file in-place, because it will be
# overwritten during package upgrades.  It is recommended to use systemd
# "dropin" feature;  i.e. create file with suffix .conf under
# /etc/systemd/system/UNITNAME.service.d directory overriding the
# unit's defaults.  Look at systemd.unit(5) manual page for more info.

[Unit]
Description=PostgreSQL database server
After=network.target

[Service]
Type=forking

User=postgres
Group=postgres

# Where to send early-startup messages from the server (before the logging
# options of postgresql.conf take effect)
# This is normally controlled by the global default set by systemd
# StandardOutput=syslog

# Disable OOM kill on the postmaster
OOMScoreAdjust=-1000
# ... but allow it still to be effective for child processes
# (note that these settings are ignored by Postgres releases before 9.5)
Environment=PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
Environment=PG_OOM_ADJUST_VALUE=0

Environment=PGDATA=/db/postgres/data

ExecStartPre=/usr/libexec/postgresql-check-db-dir %N

# Temporarily use postgresql-ctl wrapper around pg_ctl until there will
# exist clients which set PGPORT in service file.
ExecStart=/usr/libexec/postgresql-ctl start -D ${PGDATA} -s -w -t 300
ExecStop=/usr/bin/pg_ctl stop -D ${PGDATA} -s -m fast
ExecReload=/usr/bin/pg_ctl reload -D ${PGDATA} -s

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=300

[Install]
WantedBy=multi-user.target

NOTE: systemd environment is not the same as the system environment. Thus, it is needed to explicitly add some Environment= in the unit file.


% systemctl status postgresql
● postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2015-05-08 16:01:01 CEST; 6s ago
  Process: 13296 ExecStart=/usr/libexec/postgresql-ctl start -D ${PGDATA} -s -w -t 300 (code=exited, status=0/SUCCESS)
  Process: 13294 ExecStartPre=/usr/libexec/postgresql-check-db-dir %N (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/system-systemd\x2dnspawn.slice/[email protected]/system.slice/postgresql.service
           ├─13300 /usr/bin/postgres -D /var/lib/pgsql/data
           ├─13311 postgres: logger process   
           ├─13325 postgres: checkpointer process   
           ├─13326 postgres: writer process   
           ├─13327 postgres: wal writer process   
           ├─13328 postgres: autovacuum launcher process   
           └─13329 postgres: stats collector process   

May 08 16:00:39 poppy systemd[1]: Starting PostgreSQL database server...
May 08 16:00:50 poppy postgresql-ctl[13296]: LOG:  redirecting log output to logging collector process
May 08 16:00:50 poppy postgresql-ctl[13296]: HINT:  Future log output will appear in directory "pg_log".
May 08 16:01:01 poppy systemd[1]: Started PostgreSQL database server.

Data base management

from user postgres shell

pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently.

Dumps can be output in script or archive file formats. Script dumps are plain-text files containing the SQL commands required to reconstruct the database to the state it was in at the time it was saved. To restore from such a script, feed it to psql. Script files can be used to reconstruct the database even on other machines and other architectures; with some modifications, even on other SQL database products.

The alternative archive file formats must be used with pg_restore to rebuild the database. They allow pg_restore to be selective about what is restored, or even to reorder the items prior to being restored. The archive file formats are designed to be portable across architectures.

Dump

  • dump a whole data base in your current directory. Will need to use psql to restore
[postgres@poppy]~/backups% pg_dump myDB  > myDB.sql 
  • dump in custom format (-Fc) with no access privileges dumped (-x) in your current directory. Can be restored with pg_restore.
[postgres@poppy]~/backups% pg_dump -x -Fc myDB  > myDB.dump

TIP: best is to always dump everything in a custom format. Table and data only can thereafter be choosen with pg_restore.

Restore

  • restore data only (-a) from a specific table (-t) and a specific schema (-n) into database (-d)
postgres@poppy ➤➤ backups/mediawiki % pg_restore -a -d myDB -n mySchema -t myTable < myDB.dump

TIP: use --disable-triggers to ignore constraints.

from postgres session shell

WARNING: cross-database references are not implemented. To work on a specific database, first connect to it.

BASIC COMMANDS

  • Log from Unix session $ psql -d MyDb -U myUser

  • List tables in DB:

\dt MySchema.*
\dt+ MySchema.*  <--- give size and description
  • List content in table:
SELECT * FROM MySchema.MyTable;
  • create role and data base:
postgres=# CREATE USER myUser WITH PASSWORD 'XXXXX';
CREATE ROLE
postgres=# CREATE DATABASE myDB TEMPLATE template0 OWNER myUser;
CREATE DATABASE

NOTE : CREATE USER will allow by default the user to login. CREATE ROLE will not allow by default, WITH LOGIN must be specified.

  • List role attribute
postgres=# \du
                             List of roles
 Role name |                   Attributes                   | Member of 
-----------+------------------------------------------------+-----------
 mailman   |                                                | {}
 mediawiki |                                                | {}
 postgres  | Superuser, Create role, Create DB, Replication | {}
 vmail     | 

postgres=# \du mailman
           List of roles
 Role name | Attributes | Member of 
-----------+------------+-----------
 mailman   |            | {}
  • insert new column
myDB=# ALTER TABLE mySchema.myTable ADD COLUMN myColName ColumnType NOT NULL ;

TIP: if you have this error ERROR: column "myColName" contains null values, then it is probably because you droped this column sometimes earlier. The trick is to first enter a default value, then field the column and remove the default value.

myDB=# ALTER TABLE mySchema.myTable ADD COLUMN myColName ColumnType NOT NULL DEFAULT 'foo';
.....
do some work
ALTER TABLE mySchema.myTable ALTER COLUMN myColName DROP DEFAULT;
  • Update value in a column
UPDATE MyTable SET myColName='xxyyc' WHERE myColName1 = myValue ;
  • grand privileges:
myDB=# GRANT ALL ON DATABASE myDB TO myUser;
myDB=# GRANT ALL ON SCHEMA mySchema TO myUser;
myDB=# GRANT ALL ON ALL TABLES IN SCHEMA mySchema TO myUser;
myDB=# ALTER DATABASE myDB OWNER TO myUser ;

This command define access privileges.

  • connect to a database as user:
postgres=# \c MyDataBase MyUser
You are now connected to database "MyDataBase" as user "MyUser".
MyDataBase=>
  • disconnect from data base
enl=# \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# 

There is no disconnect command

  • print help and quit
postgres=# \help
postgres=# \quit
  • display the privileges granted on existing tables and columns
postgres=# \dp
 Schema | Name | Type | Access privileges | Column access privileges 
--------+------+------+-------------------+--------------------------
  • remove database
postgres=# DROP DATABASE MyDb;
  • show db directory
postgres=# SHOW data_directory;
 data_directory 
----------------
 /db/pgsql/data
(1 row)
  • List database:
postgres=# \l
                                       List of databases
      Name       |      Owner      | Encoding |  Collate   |   Ctype    |   Access privileges
-----------------+-----------------+----------+------------+------------+-----------------------
 django_slacklog | django_slacklog | UTF8     | en_US.UTF8 | en_US.UTF8 | 
 enlwiki         | mediawiki       | UTF8     | en_US.UTF8 | en_US.UTF8 | 
 postgres        | postgres        | UTF8     | en_US.UTF8 | en_US.UTF8 | 
 template0       | postgres        | UTF8     | en_US.UTF8 | en_US.UTF8 | =c/postgres          +
                 |                 |          |            |            | postgres=CTc/postgres
 template1       | postgres        | UTF8     | en_US.UTF8 | en_US.UTF8 | 
(5 rows)
  • get size of data base
postgres=# SELECT pg_size_pretty(pg_database_size('myDB'));
 pg_size_pretty 
----------------
 11 MB
(1 row)
  • get size of all data bases
postgres=# SELECT pg_database.datname, 
pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database;
  • list role and attributes:
postgres=# \du
                                List of roles
    Role name    |                   Attributes                   | Member of 
-----------------+------------------------------------------------+-----------
 enlglobal       | Create role, Create DB                         | {}
 django_slacklog |                                                | {}
 mediawiki       |                                                | {enlglobal}
 postgres        | Superuser, Create role, Create DB, Replication | {}
  • Alter role
postgres=# ALTER ROLE RoleName WITH .... ;

Please refer to Postgresql doc to a complete list of commands to alter role.

  • Remove role
postgres=# DROP ROLE roleName;
  • delete table row with condition
myDB=# DELETE FROM mySchema.myTable WHERE columnName [CONDITION];

E.g:

myDB=# DELETE FROM mywiki.page WHERE page_touched > '2015-01-08 17:28:48+01';
  • garbage-collect and optionally analyze a database
myDB=# VACUUM (ANALYZE) mySchema.myTable ;
  • empty a whole table. Use TRUNCATE
myDB=# TRUNCATE mySchema.myTable CASCADE;
  • create schema for a data base and add to your path
postgres=# \c myDB
You are now connected to database "myDB" as user "postgres".
myDB=# CREATE SCHEMA mySchema AUTHORIZATION myRole;
myDB=# SET search_path TO mySchema;
myDB=# SHOW search_path;
 search_path 
-------------
 mySchema
(1 row)

A schema is essentially a namespace: it contains named objects (tables, data types, functions, and operators) whose names can duplicate those of other objects existing in other schemas.

  • to create or access objects in a schema write a qualified name consisting of the schema name and table name separated by a dot: schema.table

  • list schema of enlwiki data base

enlwiki=# \dn
  List of schemas
  Name  |   Owner
--------+-----------
 mywiki | mediawiki
 public | mediawiki
(2 rows)
  • drop constraint (key)
myDB=# ALTER TABLE mySchema.myColumn DROP CONSTRAINT keyName CASCADE;

From Web interface

The postgresql data base can be managed from web interface phppgadmin URL. postgres user is not allowed to connect because of security reasons.

import data base fro Heroku

Once a dump file has been downloaded from Heroku:

postgres@poppy $ pg_restore --verbose --clean --no-acl --no-owner -h localhost -d mediawiki HerokuDumpFile

This will create a new database called mylocaldb wich will dump MyApp postgresql db. The mylocaldb shall NOT exists before the command is run.

Command: pg_restore --verbose --clean --no-acl --no-owner -h localhost -d [db_name] production.dump

Customization settings

They all are done in the /db/pgsql/data/postgresql.con. The settings have been based according to a 32G RAM box.

Shared memory

Shared memory allows processes to access common structures and data by placing them in shared memory segments. It is the fastest form of inter-process communication available since no kernel involvement occurs when data is passed between the processes. In fact, data does not need to be copied between the processes.

To see all shared memory settings, execute:

 % ipcs -lm

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398442373116
min seg size (bytes) = 1

SHMMAX Parameter

Setting SHMMAX Parameter defines the maximum size in bytes of a single shared memory segment that a Linux process can allocate in its virtual address space

To determine the maximum size of a shared memory segment, run:

# cat /proc/sys/kernel/shmmax
18446744073692774399

Then:

# echo "kernel.shmmax=18446744073692774399" >> /etc/sysctl.conf

SHMMNI Parameter

This parameter sets the system wide maximum number of shared memory segments. 4096 is the recommended value

# cat /proc/sys/kernel/shmmni
4096

SHMALL Parameter

This parameter sets the total amount of shared memory pages that can be used system wide. Hence, SHMALL should always be at least ceil(shmmax/PAGE_SIZE)

In most cases this setting should be sufficient since it means that the total amount of shared memory available on the system is 20971524096 bytes (shmallPAGE_SIZE) which is 8 GB. PAGE_SIZE is usually 4096 bytes

echo "kernel.shmall=2097152" >> /etc/sysctl.conf

Extension

CREATE EXTENSION loads a new extension into the current database.

To have an effective monitoring you should install the following extensions : pg_stat_statements, pg_buffercache

# dnf install postgresql-contrib

Then log as postgres user

postgres=# CREATE EXTENSION pg_buffercache;
CREATE EXTENSION
postgres=# CREATE EXTENSION pg_stat_statements;
CREATE EXTENSION

List extension

postgres=# \dx
                                     List of installed extensions
        Name        | Version |   Schema   |                        Description  
                      
--------------------+---------+------------+-------------------------------------
----------------------
 pg_buffercache     | 1.0     | public     | examine the shared buffer cache
 pg_stat_statements | 1.2     | public     | track execution statistics of all SQ
L statements executed
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language

Then update /db/postgres/data/postgresql.conf

shared_preload_libraries = 'pg_stat_statements, auto_explain'
pg_stat_statements.max = 10000
pg_stat_statements.track = top
auto_explain.log_min_duration = -1
auto_explain.log_analyze = 'false'

Performance monitoring

Pgcluu, a PostgreSQL Cluster performances monitoring and auditing tool has been installed.

pgCluu comes with two Perl scripts:

  • one for PostgreSQL and System metrics collector
pgcluu_collectd [options] output_dir
  • one for report generator.
pgcluu [options] -o report_dir input_dir

Configuration

Output directory has been created as /storage/log/psqlStat. To analyze a special DB, make a new directory /storage/log/psqlStat/MyDb1.

Generated reports directory has been created as /storage/log/psqlReport

Start the service

A systemd service file has been created

/etc/systemd/system/pgcluu-collectd.service
--------------------------------------------
[Unit]
Description=Postgresql performance and auditing
Requires=postgresql.service
After=postgresql.service

[Service]

User=postgres
Group=postgres
PIDFile=/tmp/pgcluu_collectd.pid
Environment=OUTPUTDIR=/storage/log/psqlStat
Environment=PGPASSWORD=XXXXXX
ExecStart=/usr/local/bin/pgcluu_collectd $OUTPUTDIR
ExecStop=/usr/local/bin/pgcluu_collectd -k

[Install]
WantedBy=multi-user.target

Start the service:

# systemctl start pgcluu-collectd

Generate and see the report

To generate the report:

# pgcluu rm /storage/log/psqlReport/*
# pgcluu -o /storage/log/psqlReport/ /storage/log/psqlStat/

To see the report, a Nginx conf file has been written:

/etc/nginx/conf.d/psqlstat.conf
----------------------------

server {
	listen 80;
	server_name localhost;
	root /storage/log/psqlReport;
	location / {
	index index.php index.html;
	}
}

This allows to see the report in html on localhost poppy.

Ressources