Fast Construction of Novel Search Station: 2. Content Page Analysis

Time:2019-9-11

Three party framework

  1. JSOUP
  2. okhttp

Analytic elements

  1. Chapter Two: Chapter One
  2. Chapter Two: The Next Chapter
  3. Catalog
  4. content

Fast Construction of Novel Search Station: 2. Content Page Analysis

Table design

/**
     * content
     */
    private String content;
    @Field("content_title")
    private String contentTitle;
    @Field("chapter_url")
    private String chapterUrl;
    @Field("next_chapter_url")
    private String nextChapterUrl;
    @Field("last_chapter_url")
    private String lastChapterUrl;

Parsing code

public BookChapter content(String url) {
        BookChapter bookChapter = new BookChapter();

        BookSite bookSite = getSite(url);
        try {
            Document document = download(url);

            Element titleElement = document.selectFirst(bookSite.getContentTitle());
            if (titleElement != null) {
                bookChapter.setName(titleElement.text());
            }

            Element chapterElement = document.selectFirst(bookSite.getChapterUrl());
            if (chapterElement != null) {
                bookChapter.setChapterUrl(chapterElement.absUrl("href"));
            }

            Element nextElement = document.selectFirst(bookSite.getNextChapterUrl());
            if (nextElement != null) {
                bookChapter.setNextChapterUrl(nextElement.absUrl("href"));
            }

            Element lastElement = document.selectFirst(bookSite.getLastChapterUrl());
            if (lastElement != null) {
                bookChapter.setLastChapterUrl(lastElement.absUrl("href"));
            }

            Element contentElement = document.selectFirst(bookSite.getContent());

            if (contentElement != null) {
                contentElement.select("a").remove();
                contentElement.select("script").remove();
                contentElement.select("style").remove();

                bookChapter.setContent(contentElement.html());
            }

        } catch (IOException e) {
            log.error(e.getMessage(), e);
        }

        return bookChapter;
    }

final result

Fast Construction of Novel Search Station: 2. Content Page Analysis

difficulty

There is no difficulty in technology, it is difficult in daily maintenance.

Recommended Today

Java atomicinteger class using

A counter For ordinary variables, when multithreading operations are involved, classic thread safety problems will be encountered. Consider the following code: private static final int TEST_THREAD_COUNT = 100; private static int counter = 0; public static void main(String[] args) { final CountDownLatch latch = new CountDownLatch(TEST_THREAD_COUNT); Thread[] threads = new Thread[TEST_THREAD_COUNT]; for (int i = […]