Concurrent implementation of go web crawler

Time:2020-10-20

Title: exercise: Web Crawler

Direct reference https://github.com/golang/tour/blob/master/solutions/webcrawler.go However, the code uses Chan bool to store whether the sub coroutine is completed. My code uses waitgroup to let the main coroutine wait for the sub coroutine to complete.

For complete code, please refer to https://github.com/sxpujs/go-example/blob/master/crawl/web-crawler.go

The code added to the original program is as follows:

var fetched = struct {
	m map[string]error
	sync.Mutex
}{m: map[string]error{}}

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
	if _, ok := fetched.m[url]; ok || depth <= 0 {
		return
	}
	body, urls, err := fetcher.Fetch(url)

	fetched.Lock()
	fetched.m[url] = err
	fetched.Unlock()

	if err != nil {
		return
	}
	fmt.Printf("Found: %s %q\n", url, body)
	var wg sync.WaitGroup
	for _, u := range urls {
		wg.Add(1)
		go func(url string) {
			defer wg.Done()
			Crawl(url, depth-1, fetcher)
		}(u)
	}
	wg.Wait()
}

Recommended Today

Big data Hadoop — spark SQL + spark streaming

catalogue 1、 Spark SQL overview 2、 Sparksql version 1) Evolution of sparksql 2) Comparison between shark and sparksql 3)SparkSession 3、 RDD, dataframes and dataset 1) Relationship between the three 1)RDD 1. Core concept 2. RDD simple operation 3、RDD API 1)Transformation 2)Action 4. Actual operation 2)DataFrames 1. DSL style syntax operation 1) Dataframe creation 2. SQL […]