You’re right. You can climb web data, c# or just like jQuery

Time:2021-12-29

1: Background

1. Tell a story

Some time ago, I created a local people’s livelihood information number. I copied all the information from you. You copied from the official media. Small citizens like anecdotes. Therefore, there is a demand for how to capture the news on the local number of anecdotes and anecdotes. In fact, it is very simple to do. Just use logical regression. This article mainly discusses how to capture it, In C #, we all know that the common library is htmlagility pack, but the mainstream method of this library is to use XPath to extract web content, which makes me very unhappy. After all, I’m not familiar with inexplicable resistance. Ha, coder at my age has been educated by jQuery for at least 5-6 years, so I must use jQuery like methods, and pyquery in Python to do this, Is there a similar way in c#? Hey, hey, there are really… This is the csquery introduced in this article.

2: Csquery

1. Installation

Address of GitHub:https://github.com/zone117x/CsQueryThen nuget in vs:

2. Give several examples

Everything is ready. How does that work? Don’t worry, let me take Blog Park as two examples.

1) Extract the friendship connection in the home page to

As shown in the figure above, to get the information hereLinksFor a few large characters, text () cannot be used directly. By default, it will catch the text of all child nodes, as shown in the following figure:

What about it? You can use the contents method provided by jQuery, then judge whether there is a text node in all the obtained child nodes, and finally obtain the content of the text node, as shown in the following code:

JS is done. How do you do it with csquery code? Imitate the following code:

static void Main(string[] args)
        {
            var jquery = CQ.CreateDocument(new WebClient().DownloadString("http://cnblogs.com"));

            var content = jquery["#friend_link"].Contents().Filter((dom) =>
            {
                return dom.NodeType == NodeType.TEXT_NODE;
            }).Text();

            Console.WriteLine(content);
        }

I don’t know if it’s troublesome to extract such content with XPath, but it’s not easy to use jQuery, but you’re familiar with it.

2) How to color some elements in HTML

Sometimes, for business purposes, you need to change the color of some HTML tags, for example, in the tabmenu on the home pageBo WenandSpecial areaChange to red, as shown below:

What about csquery? If you have played jQuery, the general steps are as follows:

  • Use each to traverse each child Li tag

  • Use CSS method to style a tag in Li

  • Use render to generate a new HTML

With the steps, the c# code is as follows:

static void Main(string[] args)
        {
            Config.HtmlEncoder = HtmlEncoders.None;

            var jquery = CQ.CreateDocument(new WebClient().DownloadString("http://cnblogs.com"));

            var html = jquery["#nav_left li"].Each(dom =>
               {
                   var self = jquery[dom];

                   var text = self.Text();

                   If (text = = "Bo Wen" | text = = "special area")
                   {
                       self.Find("a").CssSet(new { color = "red" });
                   }
               }).Render();
        }

3) Other operation methods

In addition to the above two operation methods, you can also use more than 100 practical methods, such as after, before, replaceall, is, etc. This article certainly can’t be introduced one by one. If you are interested, you can download it and have a look and make fun of it.

3: Other uses

In addition to capturing elements in HTML, I think it can also be used to manipulate email templates when sending emails. After all, a long time ago, we used jQuery to draw HTML, so csquery can also be used. Compared with XSLT, it has advantages and disadvantages. Next, let’s take an example:

1. Generate an HTML template

2. Use csquery to append Li to UL

You can use append to append content toWithin the node.

class Program
    {
        static void Main(string[] args)
        {
            Config.HtmlEncoder = HtmlEncoders.None;

            var strlist = new string[2] { "1", "2" };

            var path = Environment.CurrentDirectory + "\.html";
            var jquery = CQ.CreateFromFile(path);

            foreach (var str in strlist)
            {
                jquery.Find("#main").Append($"{str}");
            }

            var html = jquery.Render();
        }
    }

3. Partial rendering renderselection

The render method is to render the whole DOM into HTML, but sometimes you only need to get the part you modify instead of the whole HTML. This involves partial rendering, which can be usedRenderSelectionThe code is as follows:

static void Main(string[] args)
        {
            Config.HtmlEncoder = HtmlEncoders.None;

            var strlist = new string[2] { "1", "2" };

            var path = Environment.CurrentDirectory + "\.html";
            var jquery = CQ.CreateFromFile(path);

            var current = jquery.Find("#main");

            foreach (var str in strlist)
            {
                current.Append($"{str}");
            }

            var html = current.RenderSelection();

            Console.WriteLine(html);
        }

------------- output ----------------

12

4: Summary

JQuery is a comfortable operation mode for me. After all, I’m familiar with it! However, queryselector and queryselectorall have been added in HTML5 to support CSS3 selectors, which are very powerful. However, jQuery is not only flexible in selectors, but also flexible in the operation of nodes. Generally speaking, it can be nostalgic when it is not particularly interactive.

More high quality dry goods: see my GitHub:dotnetfly

图片名称