-
Notifications
You must be signed in to change notification settings - Fork 0
/
pldrm.nw
272 lines (250 loc) · 11.3 KB
/
pldrm.nw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
\documentclass{scrartcl}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{hyperref}
\usepackage{caption}
\usepackage{tikz}
\usetikzlibrary{shapes,arrows}
\usepackage[parfill]{parskip}
\usepackage{noweb}
\pagestyle{noweb}
\begin{document}
\newcommand\ldotd{\mathinner{..}}
\newtheorem{lem}{Lemma}
\newtheorem{dfnt}{Definition}
\newtheorem{asst}{Assertion}
% Define block styles
\tikzstyle{block} = [rectangle, draw, fill=blue!20,
text width=5em, text centered, rounded corners, minimum height=4em]
\title{Jeuring's algorithm on palindromes}
\subtitle{An implement with literate programming and C++ programming language}
\maketitle{}
\section{Introduction}
All the techniques used in this article is well-introduced in Don Knuth's magnificent book~\cite{taocp}, which is strongly-suggested in Computer Science.
This is a very efficient algorithm, designed by Johan Jeuring, which is used to determine the longest palindrome in $O(n)$ time. You can read his original article~\cite{jj} and his pretty Haskell code. Now, I want to show how his algorithm works and why it's right.
As usual, a program (especially C/C++ code) is made up of three parts: preprocessors, global variables and routines.
<<pldrm.cc>>=
<<preprocessors>>
<<globals>>
<<main>>
@
It's essential to include the standard libraries, such as cstdio, cstdlib and cstring.
<<preprocessors>>=
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <algorithm>
using namespace std;
@
\section{Main routine}
Here's the main routine.
<<main>>=
int main()
{
<<mainvar>>
<<input>>
<<main loop>>
<<output>>
return 0;
}
@
Given that $s[1\ldotd n]$ is the string inputted, and $n$ is the length of $s$, where \mbox{$s[0]=1,s[n+1]=0$}.
<<preprocessors>>=
#define MAX_LEN 100000
@
<<globals>>=
int n;
char s[MAX_LEN+2];
@
<<input>>=
s[0] = 1;
scanf("%s", &s[1]);
n = strlen(&s[1]);
@
\begin{dfnt}
We say $l+r$ is the center of substring $s[l\ldotd r]$. For example, $4$ is the center of $s[2\ldotd2]$ or $s[1\ldotd3]$.
\end{dfnt}
$a_k$ is the length of the longest palindrome whose center is $k$, and $a[0\ldotd2n]$ is the array to save $\langle a_k\rangle$.
\emph{The algorithm is used to determine $a_k$.}
<<output>>=
for (int k=1; k<=2*n+1; k++) {
printf("%d\n", a[k]);
}
@
\section{Main loop}
\begin{dfnt}
We call some string $A$ is a tail palindrome of the other string $B$ if and only if $A$ is a palindromic tail substring of $B$. For example, $A=aba$ and $B=aaaababa$, where $A$ is palindromic and $A$ is a tail substring of $B$.
\end{dfnt}
The main loop calculate array $a$ in the increasing order of the index. We will see that the invariant of main loop is assertion \ref{asst1}.
<<globals>>=
int a[2*MAX_LEN+4];
@
<<mainvar>>=
int j, l;
<<main loop>>=
j = 1;
l = 1;
a[0] = 1;
a[1] = 0;
for (int k=2; k<=n+1; k++) {
<<process>>
advance:
;
}
@
Process is made up of an infinite loop, which is used to find the longest tail palindrome of $s[0\ldotd k]$. There are two exits of it. One is in extension subroutine, while the other one is in move loop. The way of exit is \emph{goto advance;}.
The loop invariant for process is assertion \ref{asst3}.
<<process>>=
for (;;) {
<<check>>
<<move loop>>
}
@
The check subroutine checks whether a tail palindrome of $s[0\ldotd k-1]$ could be extended to that of $s[0\ldotd k]$. If so, exit from the process loop and advance $k$, otherwise start the move loop.
<<check>>=
if (s[k] == s[k-l-1]) {
l += 2;
goto advance;
}
@
Here's the move loop, which looks short and easy. It's used to find a shorter tail palindrome of $s[0\ldotd k-1]$.
<<mainvar>>=
int p;
@
<<move loop>>=
a[++j] = l;
for (p=j-1; --l>=0&&l!=a[p]; p--) {
a[++j] = min(a[p], l);
}
if (l < 0) {
l = 1;
goto advance;
}
@
\section{The outline of the proof}
\begin{lem}
An arbitrary tail palindrome of $s[0\ldotd m]$ is also a tail palindrome of $s[1\ldotd m]$ for all $m>0$.
\end{lem}
\begin{lem}
The substring of $s$ whose center is $c$ and length is $l$ is \mbox{$s[(c-l+1)/2\ldotd(c+l-1)/2]$}.
\end{lem}
\begin{center}
\begin{tikzpicture}[->,>=stealth',node distance=4.5cm, auto]
% Place nodes
\node [block] (loop) {All computed?};
\node [block, left of=loop] (init) {Initiate};
\node [below of=init, node distance=3cm] (begin) {Begin};
\node [above of=loop, node distance=3cm] (end) {Assertion 7};
\node [block, right of=loop] (check) {Check};
\node [block, right of=check] (move) {Move};
\node [block, below of=check, node distance=3cm] (advance) {Advance};
% Draw edges
\path (begin) edge (init);
\path (init) edge node {Assertion 1} (loop);
\path (loop) edge node [near start] {Yes} (end);
\path (loop) edge node {Assertion 1} (check);
\path (check) [bend left] edge node {Assertion 2} (move);
\path (move) [bend left] edge node {Assertion 3} (check);
\path (check) edge node {Assertion 5} (advance);
\path (move) [bend left] edge node {Assertion 4} (advance);
\path (advance) edge node {Assertion 6} (loop);
\end{tikzpicture}
\captionof{figure}{Flowchart of the main loop}
\end{center}
Now let's label some arrows with an assertion on the flow chart of the main loop. This is a technique to prove the validity of the algorithm, introduced in Don Knuth's \emph{The Art of Computer Programming}~\cite{taocp}, section \S1.2.1, mathematical induction. I quote the most important statement on that book here:
\begin{quote}
if an assertion attached to any arrow leading into the box is true before the operation in that box is performed, then all of the assertions on relevant arrows leading away from the box are true after operation
\ldots\ldots
For we can now use induction on the number of steps of the computation, in the sense of the number of arrows traversed in the flow chart.
\end{quote}
Assertions are stated:
\begin{asst}\label{asst1}
$2\le k\le n+1$, and the length and the center of the longest tail palindrome of $s[0\ldotd k-1]$ is $l$ and $j+1$, while $a[0\ldotd j]$ is computed. (Notice that once assigned, $a[t]$ will not be reassigned, therefore that $a[t]$ is computed/calculated/determined means that $a[t]$ is assigned and $a[t]=a_t$.)
\end{asst}
\begin{asst}\label{asst2}
$2\le k\le n+1$, $s[k-l\ldotd k-1]$ is a palindrome or empty string, whose center is $j+1$, $l\ge0$, the length of the longest tail palindrome of $s[0\ldotd k]$ is less than $l+2$, and $a[0\ldotd j]$ is calculated.
\end{asst}
\begin{asst}\label{asst3}
$2\le k\le n+1$, $s[k-l\ldotd k-1]$ is a palindrome or empty string, whose center is $j+1$, $l\ge0$, the length of the longest tail palindrome of $s[0\ldotd k]$ is not more than $l+2$, and $a[0\ldotd j]$ is assigned.
\end{asst}
\begin{asst}\label{asst4}
$2\le k\le n+1$, the length and the center of the longest tail palindrome of $s[0\ldotd k]$ is $l=1$ and $j+1$, where $a[0\ldotd j]$ is determined.
\end{asst}
\begin{asst}\label{asst5}
$2\le k\le n+1$, the length and the center of the longest tail palindrome of $s[0\ldotd k]$ is $l\ge2$ and $j+1$, and $a[0\ldotd j]$ is calculated.
\end{asst}
\begin{asst}\label{asst6}
$3\le k\le n+2$, the length and the center of the longest tail palindrome of $s[0\ldotd k-1]$ is $l$ and $j+1$, and $a[0\ldotd j]$ is computed.
\end{asst}
\begin{asst}\label{asst7}
the length and the center of the longest tail palindrome of $s[0\ldotd k-1]$ is $l$ and $j+1$, and $[0\ldotd j]$ is determined, where $k=n+2,l=1$ and $j=2n+1$.
\end{asst}
\begin{center}
\begin{tikzpicture}[->,>=stealth',node distance=4.5cm, auto]
% Place nodes
\node [block] (init) {Init};
\node [below of=init, node distance=3cm] (a2) {Assertion 2};
\node [block, right of=init] (decl) {Decrease $l$};
\node [block, right of=decl] (fnd) {Found or hit bound?};
\node [block, below of=fnd, node distance=3cm] (fill) {Fill $a$};
\node [block, right of=fnd] (bnd) {Bound?};
\node [block, below of=bnd, node distance=3cm] (reass) {Reassign $l$};
\node [below of=reass, node distance=2cm] (a4) {Assertion 4};
\node [above of=bnd, node distance=2cm] (a3) {Assertion 3};
% Draw edges
\path (a2) edge (init);
\path (init) edge (decl);
\path (decl) edge node {Assertion 8} (fnd);
\path (fnd) edge node [near start] {No} (fill);
\path (fill) [bend left] edge (decl);
\path (fnd) edge node [near start] {Yes} (bnd);
\path (bnd) edge (reass);
\path (reass) edge (a4);
\path (bnd) edge (a3);
\end{tikzpicture}
\captionof{figure}{Flowchart of the move loop}
\end{center}
Now we observe a non-trivial lemma, which is the key to the move loop.
\begin{lem}
Suppose that $l=a_c$, $j$ is a positive integer such that $j\le l$, and $a_{c-j}\neq l-j$, we have
$a_{c+j}=\min(a_{c-j},l-j)$.
\end{lem}
Just like the flowchart of the main loop, we can make some assertions on the flow chart of the move loop. I will figure out the critical part, so that others could be discovered mechanically.
\begin{asst}\label{asst8}
$2\le k\le n+1, l\ge-1$, the center of $s[k-l-1\ldotd k]$ is $j+1$, $a[0\ldotd j]$ is well-computed, $p+j=2j_0-1$, and the length of the longest tail palindrome of $s[0\ldotd k]$ is not more than $l+2$, where $j_0$ is the $j$ after \emph{init}.
\end{asst}
\section{Analysis of the algorithm}
It's so hard for me to analyze the algorithm, so I'll show some partial results, which results in the running time complexity is $\Theta(n)$.
Look at the flow chart of the main loop. Suppose that $A,B$ is the times of going through the arrow from \emph{Move} to \emph{Check} and \emph{Move} to \emph{Advance}, and $C$ is the times of running of \emph{Fill a}. Out of Kirchhoff's first law for the flow chart, we obtain:
\begin{tabular}{|l|l|}
\hline
\emph{Initiate}&$1$\\\hline
\emph{All computed?}&$n+1$\\\hline
\emph{Check}&$n+A$\\\hline
\emph{Advance}&$n$\\\hline
\emph{Init}&$A+B$\\\hline
\emph{Decrease $l$}&$A+B+C$\\\hline
\emph{Found or hit bound?}&$A+B+C$\\\hline
\emph{Fill $a$}&$C$\\\hline
\emph{Bound?}&$A+B$\\\hline
\emph{Reassign $l$}&$B$\\
\hline
\end{tabular}
For there are only two operations containing $j\gets j+1$: \emph{Init} and \emph{Fill $a$}, we have $A+B+C=2n+1-1=2n$. Given that $b_k$ is the length of longest tail palindrome of $s[0\ldotd k]$, we have
\[
B=\sum_{k=2}^{n+1}[b_k=1]
\]
where $[P]$ is Iverson bracket~\cite{ivs}, which is well introduced in \emph{Concrete Mathematics}~\cite{cmath}.
I cannot get the more precise result for $A$ and $C$. Anybody good at mathematics or familiar with analysis of algorithms could help me!
\begin{thebibliography}{99}
\bibitem{taocp} Donald E. Knuth: \emph{The Art of Computer Programming} Volume 1, third edition.
\bibitem{cmath} Ronald L. Graham, Donald E. Knuth, Oren Patashnik: \emph{Concrete Mathematics} second edition.
\bibitem{noweb} \href{http://www.cs.tufts.edu/~nr/noweb/onepage.ps}{\emph{One-Page Guide to Using Noweb with \LaTeX}}
\bibitem{jj} Johan Jeuring: \href{http://finding-palindromes.blogspot.com/2012/05/finding-palindromes-efficiently.html}{\emph{Finding palindromes efficiently}}
\bibitem{lshort} T Oetiker: \emph{The Not So Short Introduction to \LaTeXe} (1995)
\bibitem{tikz1} Kjell Magne Fauske: \href{http://www.texample.net/tikz/examples/simple-flow-chart/}{\emph{Example: Simple flow chart}}
\bibitem{tikz2} The TikZ and PGF manual: \href{http://www.texample.net/tikz/examples/simple-flow-chart/}{\emph{Example: State machine}}
\bibitem{ivs} \href{http://en.wikipedia.org/wiki/Iverson_bracket}{\emph{Iverson bracket}}
\end{thebibliography}
\end{document}