Crawlab v0.3.1 release (Docker image optimization)

Time:2019-10-9

Crawlab is a distributed crawler management platform based on Golang, supporting Python, NodeJS, Java, Go, PHP and other programming languages as well as a variety of crawler frameworks.

The project has been praised by crawler enthusiasts and developers since it was launched in March this year. Many users also said they would use Crawlab to build the company’s crawler platform. Now there are 2K stars on Github and 1.4k pulls on Dockerhub. After several months of iteration, we successively launched timed tasks, data analysis, configurable crawlers, automatic extraction of fields, download results, upload crawlers, Docker deployment and other functions, Crawlab will be more practical, more comprehensive, can really help users solve the problem of crawler management.

Crawlab mainly addresses a large number of difficult crawler management issues, such as the need to monitor the mix of hundreds of websitesscrapyandseleniumIt is not easy to manage projects at the same time, and the cost of command line management is very high, and it is easy to make mistakes. Crawlab supports any language and any framework, cooperating with task scheduling and task monitoring, it is easy to effectively monitor and manage large-scale crawler projects.

  • View Demo
  • Github: https://github.com/crawlab-te…

Update content

This updatev0.3.1It’s an optimization update, focusing on Docker image optimization, front-end optimization, and some Bug fixes.

The updates are as follows:

Function/optimization

  • Docker image optimizationDocker image is further separated into master, worker and frontend to enhance production environment support and reduce volume with alpine image
  • unit testingCovering part of the back-end Golang code with unit test cases
  • Front-end optimizationUI optimization of login page, button size, prompt, etc.
  • More flexible node registrationAllow users to pass a variable to distinguish nodes by default using MAC addresses (for users who need to deploy multiple machines, you can view this feature)

Bug Repair

  • Error uploading crawler fileUploading large crawler files can cause memory overflow. #150
  • Unable to synchronize crawler filesIncreasing write permission bypassing permission is not enough to cause synchronization crawler failure. #114
  • Crawler Page ProblemRemove the crawler page “Site” field to fix it. #112
  • Node Display ProblemWhen multiple machines run nodes with docker, only one node is displayed, which is solved by passing variables as identifiers. #99

Next plan

  • [] configurable crawler
  • [] Log centralized management
  • [] Abnormal monitoring and alarm
  • [] RBAC privilege control
  • [] JWT privilege validation optimization
  • [] Interface Installation of Third Party Packages

We are planning the next arrangements, including the priority of each task, how to achieve and so on. If you have friends interested in related functions or other functions, please ask Github Issues, or add the author’s Wechat Tikazyq1 to tell us.

Community

If you think Crawlab is helpful to your daily development or company, Please add the author’s Wechat Tikazyq1 with “Crawlab” and the author will pull you into the group. Welcome to star on Github, and if you encounter any problems, please feel free to mention issues on Github. In addition, you are welcome to contribute to the development of Crawlab.

<p align=”center”>

<img height="360">

</p>

This article is automatically generated by the article publishing tool ArtiPub