Skip to content

Commit

Permalink
Releae 5.1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
rzo1 committed Oct 22, 2024
1 parent ba6598c commit 4edbbb7
Show file tree
Hide file tree
Showing 16 changed files with 41 additions and 47 deletions.
37 changes: 17 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This repository contains a fork of [yasserg/crawler4j](https://github.com/yasser
---

crawler4j is an open source web crawler for Java which provides a simple interface for
crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes.
crawling the Web. Using it, you can set up a multithreaded web crawler in few minutes.

## Table of content

Expand All @@ -25,21 +25,18 @@ crawling the Web. Using it, you can setup a multi-threaded web crawler in few mi

## Why you should use this fork?

This fork starts where the development of the previous main repository stalled.
This fork picks up where development on the original main repository left off, bringing several key improvements:

Some highlights include:

- choice between multiple frontier implementations => avoid using a database with a license that doesn't comply with your use-case
- easy substitution of various parser implementations (not only for html, but also css, binary, and plain text)
- dynamic authentication
- improved exception handling, more versatile to customize
- fixes various parsing issues
- more documentation
- more tests and all tests are JUnit5 based (so no knowledge of Groovy and/or Spock needed anymore to maintain the codebase)
- uses Apache Maven as build tool
- provides a clean upgrade path by keeping backward compatibility in mind and deprecating methods before removing them
- more eyes have gone through the code, so readability and correctness have improved
- maintained, i.e. dependencies are often updated to their latest versions
- Offers a choice between multiple frontier implementations, allowing you to avoid databases with incompatible licenses for your use case.
- Simplifies swapping out parser implementations, supporting not just HTML but also CSS, binary, and plain text formats.
- Supports dynamic authentication.
- Enhances exception handling, making it easier to customize.
- Fixes various parsing issues.
- Includes expanded documentation.
- Features additional tests, now entirely based on JUnit 5 (eliminating the need for Groovy or Spock knowledge to maintain the codebase).
- Utilizes Apache Maven as the build tool.
- Ensures a smooth upgrade path with backward compatibility and method deprecation before removal.
- Improves code readability and correctness, with more contributors reviewing the code.

## Installation

Expand All @@ -51,7 +48,7 @@ Add the following dependency to your pom.xml:
<dependency>
<groupId>de.hs-heilbronn.mi</groupId>
<artifactId>crawler4j-with-sleepycat</artifactId>
<version>5.0.2</version>
<version>5.1.0</version>
</dependency>
```

Expand All @@ -63,7 +60,7 @@ Otherwise, you can use `HSQLDB` instead
<dependency>
<groupId>de.hs-heilbronn.mi</groupId>
<artifactId>crawler4j-with-hsqldb</artifactId>
<version>5.0.2</version>
<version>5.1.0</version>
</dependency>
```

Expand All @@ -73,18 +70,18 @@ or you use an external [crawler-commons/url-frontier](https://github.com/crawler
<dependency>
<groupId>de.hs-heilbronn.mi</groupId>
<artifactId>crawler4j-with-urlfrontier</artifactId>
<version>5.0.2</version>
<version>5.1.0</version>
</dependency>
```

## Quickstart

### Archetype

Since `5.0.1`, we provide a Maven archetype to bootstrap crawler4j development. Just urn
We provide a Maven archetype to bootstrap crawler4j development. Just run

```bash
mvn archetype:generate -DarchetypeGroupId=de.hs-heilbronn.mi -DarchetypeArtifactId=crawler4j-archetype -DarchetypeVersion=5.0.1
mvn archetype:generate -DarchetypeGroupId=de.hs-heilbronn.mi -DarchetypeArtifactId=crawler4j-archetype -DarchetypeVersion=5.1.0
```

### Manual
Expand Down
2 changes: 1 addition & 1 deletion crawler4j-archetype/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-parent</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion crawler4j-boms/crawler4j-with-hsqldb/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-boms</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion crawler4j-boms/crawler4j-with-sleepycat/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-boms</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<packaging>pom</packaging>
Expand Down
2 changes: 1 addition & 1 deletion crawler4j-boms/crawler4j-with-urlfrontier/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-boms</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<packaging>pom</packaging>
Expand Down
2 changes: 1 addition & 1 deletion crawler4j-boms/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-parent</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion crawler4j-commons/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-parent</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion crawler4j-core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-parent</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>

<artifactId>crawler4j-core</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion crawler4j-examples/crawler4j-examples-base/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<parent>
<artifactId>crawler4j-examples</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<artifactId>crawler4j-examples-base</artifactId>
<name>${project.groupId}:${project.artifactId}</name>
Expand Down
2 changes: 1 addition & 1 deletion crawler4j-examples/crawler4j-examples-postgres/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<parent>
<artifactId>crawler4j-examples</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<name>${project.groupId}:${project.artifactId}</name>
<artifactId>crawler4j-examples-postgres</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion crawler4j-examples/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<parent>
<artifactId>crawler4j-parent</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>

<name>${project.groupId}:${project.artifactId}</name>
Expand Down
2 changes: 1 addition & 1 deletion crawler4j-frontier/crawler4j-frontier-hsqldb/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-frontier</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion crawler4j-frontier/crawler4j-frontier-sleepycat/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-frontier</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion crawler4j-frontier/crawler4j-frontier-urlfrontier/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-frontier</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion crawler4j-frontier/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>crawler4j-parent</artifactId>
<groupId>de.hs-heilbronn.mi</groupId>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
23 changes: 10 additions & 13 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<groupId>de.hs-heilbronn.mi</groupId>
<artifactId>crawler4j-parent</artifactId>
<packaging>pom</packaging>
<version>5.1.0-SNAPSHOT</version>
<version>5.1.0</version>
<name>${project.groupId}:${project.artifactId}</name>

<description>Open Source Web Crawler for Java</description>
Expand All @@ -23,18 +23,6 @@
<developerConnection>scm:git:[email protected]:rzo1/crawler4j.git</developerConnection>
<tag>HEAD</tag>
</scm>
<distributionManagement>
<snapshotRepository>
<id>ossrh</id>
<name>Sonatype Nexus snapshot repository</name>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
</snapshotRepository>
<repository>
<id>ossrh</id>
<name>Sonatype Nexus release repository</name>
<url>https://oss.sonatype.org/service/local/staging/deploy/maven2/</url>
</repository>
</distributionManagement>
<developers>
<developer>
<id>yasserg</id>
Expand Down Expand Up @@ -168,6 +156,15 @@
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.sonatype.central</groupId>
<artifactId>central-publishing-maven-plugin</artifactId>
<version>0.6.0</version>
<extensions>true</extensions>
<configuration>
<publishingServerId>central</publishingServerId>
</configuration>
</plugin>
</plugins>
</build>
</profile>
Expand Down

0 comments on commit 4edbbb7

Please sign in to comment.