Connection trace transform

Time:2019-11-14

Connection trace transform

1. If the first package is on the prerouting or output node, the connection trace will be created in the function resolve ﹣ normal ﹣ CT. If there is a desired connection, associate its corresponding master connection, and set the status of newly created connection tracking to IP ﹣ CT ﹣ related. If there is no corresponding desired connection, set its status to IP ﹣ CT ﹣ new. Normally, there are only two states in the first package.

2. For the first reply packet and subsequent reply packet, which are also on the prerouting or output node, the corresponding connection trace in the IP CT related or IP CT new status can be found in the function resolve normal CT, and the status is set to IP CT established reply

3. For the non first request message, the corresponding connection trace can be found in the function resolve normal CT on the prerouting or output node, which is generally set to IP CT established.

4. After the ICMP error message (such as source suppression, TTL timeout, unreachable message) arrives at Netfilter, it will find the CT it belongs to according to the original message carried by ICMP. If the ICMP error message is generated by the request direction message of CT, its status is set to IP ﹐ CT ﹐ and if it is generated by the response direction message, it is set to IP ﹐ CT ﹐ reply.

5. It is very important to create a connection trace in the function init conntrack for the first package of a sub connection, find its corresponding sub connection (first create a connection trace, then find the expected connection to associate), and execute the corresponding expectfn function before leaving the init conntrack function.

We take the expected connection of typical FTP as an example to analyze the function of expectfn.

Suppose there is no nat

Active mode

The client sends port XXX, XXX, XXX, XXX, PPP and PPP to the server. The connection trace captures the message through the help structure, and then generates the expected connection trace of the request direction (assuming that the IP address obtained from the port command is dataip and the port is dataport). Suppose the answer direction IP of the main link is rip

dip/mask      = dataip/0xffffffff
dport/mask    = dataport/0xffff
SIP / mask = rip / 0xFFFFFFFF (since the source IP can be reset by the server, it is not accurate to directly set the IP address of the parent connection server here, but most of the normal situations are like this)
sport/mask    = 0/0
protocol = tcp

//It can be analyzed by calling the function NF ﹣ CT ﹣ expect ﹣ init.
/*Initialize the desired connection, use the source address and destination address in the opposite direction as the source destination address, and use the port in the content as the destination port*/
    nf_ct_expect_init(exp, NF_CT_EXPECT_CLASS_DEFAULT, cmd.l3num,
              &ct->tuplehash[!dir].tuple.src.u3, daddr,
              IPPROTO_TCP, NULL, &cmd.u.tcp.port);
void nf_ct_expect_init(struct nf_conntrack_expect *exp, unsigned int class,
               u_int8_t family,
               const union nf_inet_addr *saddr,
               const union nf_inet_addr *daddr,
               u_int8_t proto, const __be16 *src, const __be16 *dst)
{
    int len;
    /*Initialize the desired connection, use the source address and destination address in the opposite direction as the source destination address, and use the port in the content as the destination port*/

    if (family == AF_INET)
        len = 4;
    else
        len = 16;

    exp->flags = 0;
    exp->class = class;
    exp->expectfn = NULL;
    exp->helper = NULL;
    exp->tuple.src.l3num = family;
    exp->tuple.dst.protonum = proto;
//Taking the active mode of FTP as an example, NAT protects clients
    If (saddr) {/ * server source public IP, because the active mode is initiated by the server*/
        memcpy(&exp->tuple.src.u3, saddr, len);
        if (sizeof(exp->tuple.src.u3) > len)
            /* address needs to be cleared for nf_ct_tuple_equal */
            memset((void *)&exp->tuple.src.u3 + len, 0x00,
                   sizeof(exp->tuple.src.u3) - len);
        memset(&exp->mask.src.u3, 0xFF, len);
        if (sizeof(exp->mask.src.u3) > len)
            memset((void *)&exp->mask.src.u3 + len, 0x00,
                   sizeof(exp->mask.src.u3) - len);
    }Else {/ * generic if not*/
        memset(&exp->tuple.src.u3, 0x00, sizeof(exp->tuple.src.u3));
        memset(&exp->mask.src.u3, 0x00, sizeof(exp->mask.src.u3));
    }

    If (SRC) {/ * source port, generally not set, FTP data port has not been initiated, so it is not set generally*/
        exp->tuple.src.u.all = *src;
        exp->mask.src.u.all = htons(0xFFFF);
    } else {
        exp->tuple.src.u.all = 0;
        exp->mask.src.u.all = 0;
    }
    /*The destination address is the destination IP of reverse connection, which is the IP of the client after NAT*/
    memcpy(&exp->tuple.dst.u3, daddr, len);
    if (sizeof(exp->tuple.dst.u3) > len)
        /* address needs to be cleared for nf_ct_tuple_equal */
        memset((void *)&exp->tuple.dst.u3 + len, 0x00,
               sizeof(exp->tuple.dst.u3) - len);
    /*The destination port is the port in the port command, which is the modified port*/
    exp->tuple.dst.u.all = *dst;

#ifdef CONFIG_NF_NAT_NEEDED
    memset(&exp->saved_addr, 0, sizeof(exp->saved_addr));
    memset(&exp->saved_proto, 0, sizeof(exp->saved_proto));
#endif
}
static int help(struct sk_buff *skb,
        unsigned int protoff,
        struct nf_conn *ct,
        enum ip_conntrack_info ctinfo)
{
    unsigned int dataoff, datalen;
    const struct tcphdr *th;
    struct tcphdr _tcph;
    const char *fb_ptr;
    int ret;
    u32 seq;
    int dir = CTINFO2DIR(ctinfo);
    unsigned int uninitialized_var(matchlen), uninitialized_var(matchoff);
    struct nf_ct_ftp_master *ct_ftp_info = nfct_help_data(ct);
    struct nf_conntrack_expect *exp;
    union nf_inet_addr *daddr;
    struct nf_conntrack_man cmd = {};
    unsigned int i;
    int found = 0, ends_in_nl;
    typeof(nf_nat_ftp_hook) nf_nat_ftp;

    /* Until there's been traffic both ways, don't look in packets. */
    if (ctinfo != IP_CT_ESTABLISHED &&
        ctinfo != IP_CT_ESTABLISHED_REPLY) {
        pr_debug("ftp: Conntrackinfo = %u\n", ctinfo);
        return NF_ACCEPT;
    }

    th = skb_header_pointer(skb, protoff, sizeof(_tcph), &_tcph);
    if (th == NULL)
        return NF_ACCEPT;

    dataoff = protoff + th->doff * 4;
    /* No data? */
    if (dataoff >= skb->len) {
        pr_debug("ftp: dataoff(%u) >= skblen(%u)\n", dataoff,
             skb->len);
        return NF_ACCEPT;
    }
    datalen = skb->len - dataoff;

    spin_lock_bh(&nf_ftp_lock);
    fb_ptr = skb_header_pointer(skb, dataoff, datalen, ftp_buffer);
    BUG_ON(fb_ptr == NULL);

    ends_in_nl = (fb_ptr[datalen - 1] == '\n');
    seq = ntohl(th->seq) + datalen;

    /* Look up to see if we're just after a \n. */
    if (!find_nl_seq(ntohl(th->seq), ct_ftp_info, dir)) {
        /* We're picking up this, clear flags and let it continue */
        if (unlikely(ct_ftp_info->flags[dir] & NF_CT_FTP_SEQ_PICKUP)) {
            ct_ftp_info->flags[dir] ^= NF_CT_FTP_SEQ_PICKUP;
            goto skip_nl_seq;
        }

        /* Now if this ends in \n, update ftp info. */
        pr_debug("nf_conntrack_ftp: wrong seq pos %s(%u) or %s(%u)\n",
             ct_ftp_info->seq_aft_nl_num[dir] > 0 ? "" : "(UNSET)",
             ct_ftp_info->seq_aft_nl[dir][0],
             ct_ftp_info->seq_aft_nl_num[dir] > 1 ? "" : "(UNSET)",
             ct_ftp_info->seq_aft_nl[dir][1]);
        ret = NF_ACCEPT;
        goto out_update_nl;
    }

skip_nl_seq:
    /* Initialize IP/IPv6 addr to expected address (it's not mentioned
       in EPSV responses) */
    cmd.l3num = nf_ct_l3num(ct);
    memcpy(cmd.u3.all, &ct->tuplehash[dir].tuple.src.u3.all,
           sizeof(cmd.u3.all));

    for (i = 0; i < ARRAY_SIZE(search[dir]); i++) {
        found = find_pattern(fb_ptr, datalen,
                     search[dir][i].pattern,
                     search[dir][i].plen,
                     search[dir][i].skip,
                     search[dir][i].term,
                     &matchoff, &matchlen,
                     &cmd,
                     search[dir][i].getnum);
        if (found) break;
    }
    if (found == -1) {
        /* We don't usually drop packets.  After all, this is
           connection tracking, not packet filtering.
           However, it is necessary for accurate tracking in
           this case. */
        nf_ct_helper_log(skb, ct, "partial matching of `%s'",
                     search[dir][i].pattern);
        ret = NF_DROP;
        goto out;
    } else if (found == 0) { /* No match */
        ret = NF_ACCEPT;
        goto out_update_nl;
    }

    pr_debug("conntrack_ftp: match `%.*s' (%u bytes at %u)\n",
         matchlen, fb_ptr + matchoff,
         matchlen, ntohl(th->seq) + matchoff);

    exp = nf_ct_expect_alloc(ct);
    if (exp == NULL) {
        nf_ct_helper_log(skb, ct, "cannot alloc expectation");
        ret = NF_DROP;
        goto out;
    }

    /* We refer to the reverse direction ("!dir") tuples here,
     * because we're expecting something in the other direction.
     * Doesn't matter unless NAT is happening.  
     *Get the destination address in the opposite direction
     */
    daddr = &ct->tuplehash[!dir].tuple.dst.u3;

    /* Update the ftp info */
    if ((cmd.l3num == nf_ct_l3num(ct)) &&
        memcmp(&cmd.u3.all, &ct->tuplehash[dir].tuple.src.u3.all,
             sizeof(cmd.u3.all))) {
        /* Enrico Scholz's passive FTP to partially RNAT'd ftp
           server: it really wants us to connect to a
           different IP address.  Simply don't record it for
           NAT. */
        if (cmd.l3num == PF_INET) {
            pr_debug("NOT RECORDING: %pI4 != %pI4\n",
                 &cmd.u3.ip,
                 &ct->tuplehash[dir].tuple.src.u3.ip);
        } else {
            pr_debug("NOT RECORDING: %pI6 != %pI6\n",
                 cmd.u3.ip6,
                 ct->tuplehash[dir].tuple.src.u3.ip6);
        }

        /* Thanks to Cristiano Lincoln Mattos
           <[email protected]> for reporting this potential
           problem (DMZ machines opening holes to internal
           networks, or the packet filter itself). */
        if (!loose) {
            ret = NF_ACCEPT;
            goto out_put_expect;
        }
        daddr = &cmd.u3;
    }
    /*Initialize the desired connection, use the source address and destination address in the opposite direction as the source destination address, and use the port in the content as the destination port*/
    nf_ct_expect_init(exp, NF_CT_EXPECT_CLASS_DEFAULT, cmd.l3num,
              &ct->tuplehash[!dir].tuple.src.u3, daddr,
              IPPROTO_TCP, NULL, &cmd.u.tcp.port);

    /* Now, NAT might want to mangle the packet, and register the
     * (possibly changed) expectation itself. */
    nf_nat_ftp = rcu_dereference(nf_nat_ftp_hook);
    if (nf_nat_ftp && ct->status & IPS_NAT_MASK)
        ret = nf_nat_ftp(skb, ctinfo, search[dir][i].ftptype,
                 protoff, matchoff, matchlen, exp);
    else {
        /* Can't expect this?  Best to drop packet now. */
        if (nf_ct_expect_related(exp) != 0) {
            nf_ct_helper_log(skb, ct, "cannot add expectation");
            ret = NF_DROP;
        } else
            ret = NF_ACCEPT;
    }

out_put_expect:
    nf_ct_expect_put(exp);

out_update_nl:
    /* Now if this ends in \n, update ftp info.  Seq may have been
     * adjusted by NAT code. */
    if (ends_in_nl)
        update_nl_seq(ct, seq, ct_ftp_info, dir, skb);
 out:
    spin_unlock_bh(&nf_ftp_lock);
    return ret;
}

From the following statement:

/* Now, NAT might want to mangle the packet, and register the
     * (possibly changed) expectation itself. */
    nf_nat_ftp = rcu_dereference(nf_nat_ftp_hook);
    if (nf_nat_ftp && ct->status & IPS_NAT_MASK)
        ret = nf_nat_ftp(skb, ctinfo, search[dir][i].ftptype,
                 protoff, matchoff, matchlen, exp);

As you can see, if the main connection is not NAT, the exp – > expectfn function will not be set.

Passive mode

Similarly for passive mode, the client sends PASV command, and then the server sends XXX, XXX, XXX, XXX, PPP, PPP to the client. The help function captures the message and creates an expect connection. Suppose the IP of the client’s primary link is CIP.

dip/mask      = dataip/0xffffffff
dport/mask    = dataport/0xffff
sip/mask      = cip/0xffffffff
sport/mask    = 0/0
protocol = tcp

OK, let’s say that exp is in the direction of sub connection request.

If there is NAT, what else can help and expect do?

Nat is on the client side, which is a common scenario

Active mode

The client sends port XXX, XXX, XXX, XXX, PPP, PPP. The NAT device captures the message through the help function. Because the client is a private network address, the address in its port command is also a private network address. If it is sent directly to the server, the server will not be able to connect to the address. Therefore, the NAT device needs to select a public network address (which can be the same address as the main link or another address) and a new port number for data connection, perform address conversion, and refill the converted address into the port command. Suppose the port command sent by the client is port 10.10.10.10 10000. When passing through NAT equipment, replace it with 1.1.1.11000. Then the server will connect to the client’s (1.1.1.10000). The NAT device must build the desired connection for the request direction of the child connection, that is:

dip/mask      = 1.1.1.1/0xffffffff
dport/mask    = 10000/0xffff
SIP / mask = rip / 0xFFFFFFFF (since the source IP can be re specified by the server, it is not accurate to set the IP address of the parent connection server directly here, but most of the normal situations are like this)
sport/mask    = 0/0
protocol = tcp

It is assumed that the server connects with the client 1.1.1.10000 with 2.2.2.20. Then NAT will create a new connection trace for the data channel as follows:

Request direction
dip      = 1.1.1.1
dport    = 10000
sip      = 2.2.2.2
sport    = 20
protocol = tcp

Response direction:
dip      = 2.2.2.2
dport    = 20
sip      = 1.1.1.1
sport    = 10000
protocol = tcp

After such connection tracking is created, the corresponding connection tracking can be found for all messages in the request direction, but the request message received by the client is the message after the DNAT direction operation:

dip      = 10.10.10.10
dport    = 10000
sip      = 2.2.2.2
sport    = 20
protocol = tcp

The response message sent is:

dip      = 2.2.2.2
dport    = 20
sip      = 10.10.10.10
sport    = 10000
protocol = tcp

Cannot hit connection tracking.

So there are two things to do for the child connection in this case:

1. Build the NAT information for the child connection, and the NAT operation will be carried out according to the information in the NAT module.

2. Correct the response direction quintuple of connection tracking, so that the client message can hit the connection tracking.

Who did these two things?

The answer is the expect function

If (EXP) {/ * execute expected function*/
    if (exp->expectfn)
        exp->expectfn(ct, exp);
    nf_ct_expect_put(exp);
}

For FTP, this function is NF ﹣ NAT ﹣ follow ﹣ master. It can be seen from the above that the request direction of data channel needs to be DNAT. The value of exp – > dir is set for the main connection, which is the reverse direction of the direction when the help function receives the port command (the active mode is the request direction, then the reverse direction is the answer direction), so the value of exp – > dir is IP ﹣ CT ﹣ dir ﹣ reply.

/* Setup NAT on this expected conntrack so it follows master. */
/* If we fail to get a free NAT slot, we'll get dropped on confirm */
void nf_nat_follow_master(struct nf_conn *ct,
              struct nf_conntrack_expect *exp)
{
    struct nf_nat_range range;

    /* This must be a fresh one. */
    BUG_ON(ct->status & IPS_NAT_DONE_MASK);

    /* Change src to where master sends to */
    range.flags = NF_NAT_RANGE_MAP_IPS;
    range.min_addr = range.max_addr
        = ct->master->tuplehash[!exp->dir].tuple.dst.u3;
    nf_nat_setup_info(ct, &range, NF_NAT_MANIP_SRC);

    /* For DST manip, map port here to where it's expected. */
    /*DNAT treatment*/
    range.flags = (NF_NAT_RANGE_MAP_IPS | NF_NAT_RANGE_PROTO_SPECIFIED);
    range.min_proto = range.max_proto = exp->saved_proto;
    range.min_addr = range.max_addr
        =CT - > Master - > tuplehash [! Exp - > dir]. Tuple. Src. U3; // the request direction source IP of the main connection is used here, that is, the client main link IP.
    NF? NAT? Setup? Info (CT, & range, NF? NAT? Manip? DST); // build NAT information
}

Focus on the construction of NAT information

/*Modify the NAT quintuple according to the provided NAT type and scope*/
unsigned int
nf_nat_setup_info(struct nf_conn *ct,
          const struct nf_nat_range *range,
          enum nf_nat_manip_type maniptype)
{
    Struct net * NET = nf_ct_net (CT); / * get the network namespace where the connection trace is located*/
    struct nf_conntrack_tuple curr_tuple, new_tuple;

    /* Can't setup nat info for confirmed ct. */
    /*The connection that has been confirmed is not under construction*/
    if (nf_ct_is_confirmed(ct))
        return NF_ACCEPT;

    WARN_ON(maniptype != NF_NAT_MANIP_SRC &&
        maniptype != NF_NAT_MANIP_DST);

    if (WARN_ON(nf_nat_initialized(ct, maniptype)))
        return NF_DROP;

    /* What we've got will look like inverse of reply. Normally
     * this is what is in the conntrack, except for prior
     * manipulations (future optimization: if num_manips == 0,
     * orig_tp = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)
     *Get five tuples of request direction
     */
    nf_ct_invert_tuplepr(&curr_tuple,
                 &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
    /*Get the five tuples of request direction after NAT according to the five tuples of request direction*/
    get_unique_tuple(&new_tuple, &curr_tuple, range, ct, maniptype);
    /*If the five tuples in the new request direction are different from the original one, the five tuples in the response direction need to be changed*/
    if (!nf_ct_tuple_equal(&new_tuple, &curr_tuple)) {
        struct nf_conntrack_tuple reply;

        /* Alter conntrack table so will recognize replies. */
        /*Get a new quintuple of response direction from the new quintuple*/
        nf_ct_invert_tuplepr(&reply, &new_tuple);
        /*Replace five tuples of response direction*/
        nf_conntrack_alter_reply(ct, &reply);

        /* Non-atomic: we own this at the moment. */
        if (maniptype == NF_NAT_MANIP_SRC)
            ct->status |= IPS_SRC_NAT;
        else
            ct->status |= IPS_DST_NAT;
        /*Determine whether help exists in the connection, if so, you must add the SEQ adj extension function*/
        if (nfct_help(ct) && !nfct_seqadj(ct))
            if (!nfct_seqadj_ext_add(ct))
                return NF_DROP;
    }
    /*If it is a source NAT operation, add the quintuple to the NF NAT bysource hash table*/
    if (maniptype == NF_NAT_MANIP_SRC) {
        unsigned int srchash;
        spinlock_t *lock;

        srchash = hash_by_src(net,
                      &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
        lock = &nf_nat_locks[srchash % CONNTRACK_LOCKS];
        spin_lock_bh(lock);
        hlist_add_head_rcu(&ct->nat_bysource,
                   &nf_nat_bysource[srchash]);
        spin_unlock_bh(lock);
    }

    /*It's done. NAT finished*/
    if (maniptype == NF_NAT_MANIP_DST)
        ct->status |= IPS_DST_NAT_DONE;
    else
        ct->status |= IPS_SRC_NAT_DONE;

    return NF_ACCEPT;
}

The parameter CT passed in is:

Request direction
dip      = 1.1.1.1
dport    = 10000
sip      = 2.2.2.2
sport    = 20
protocol = tcp

Response direction:
dip      = 2.2.2.2
dport    = 20
sip      = 1.1.1.1
sport    = 10000
protocol = tcp

Sentence:

    nf_ct_invert_tuplepr(&curr_tuple,
                 &ct->tuplehash[IP_CT_DIR_REPLY].tuple);

It means to find the reverse connection of CT response direction, namely:

Response direction:
dip      = 2.2.2.2
dport    = 20
sip      = 1.1.1.1
sport    = 10000
protocol = tcp

Curr? Tuple is the request direction

Request direction
dip      = 1.1.1.1
dport    = 10000
sip      = 2.2.2.2
sport    = 20
protocol = tcp

Sentence:

get_unique_tuple(&new_tuple, &curr_tuple, range, ct, maniptype);

/* Manipulate the tuple into the range given. For NF_INET_POST_ROUTING,
 * we change the source to map into the range. For NF_INET_PRE_ROUTING
 * and NF_INET_LOCAL_OUT, we change the destination to map into the
 * range. It might not be possible to get a unique tuple, but we try.
 * At worst (or if we race), we will end up with a final duplicate in
 * __ip_conntrack_confirm and drop the packet. */
static void
get_unique_tuple(struct nf_conntrack_tuple *tuple,
         const struct nf_conntrack_tuple *orig_tuple,
         Const struct NF NAT range * range, // client IP
         struct nf_conn *ct,
         Enum NF ﹣ NAT ﹣ manip ﹣ type maniptype) // destination nat
{
    const struct nf_conntrack_zone *zone;
    const struct nf_nat_l3proto *l3proto;
    const struct nf_nat_l4proto *l4proto;
    struct net *net = nf_ct_net(ct);

    zone = nf_ct_zone(ct);

    rcu_read_lock();
    l3proto = __nf_nat_l3proto_find(orig_tuple->src.l3num);
    l4proto = __nf_nat_l4proto_find(orig_tuple->src.l3num,
                    orig_tuple->dst.protonum);

    /* 1) If this srcip/proto/src-proto-part is currently mapped,
     * and that same mapping gives a unique tuple within the given
     * range, use that.
     *
     * This is only required for source (ie. NAT/masq) mappings.
     * So far, we don't do local source mappings, so multiple
     * manips not an issue.
     */
    if (maniptype == NF_NAT_MANIP_SRC &&
        !(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) {
        /* try the original tuple first */
        if (in_range(l3proto, l4proto, orig_tuple, range)) {
            if (!nf_nat_used_tuple(orig_tuple, ct)) {
                *tuple = *orig_tuple;
                goto out;
            }
        } else if (find_appropriate_src(net, zone, l3proto, l4proto,
                        orig_tuple, tuple, range)) {
            pr_debug("get_unique_tuple: Found current src map\n");
            if (!nf_nat_used_tuple(tuple, ct))
                goto out;
        }
    }

    /* 2) Select the least-used IP/proto combination in the given range */
    /*2) select the least used IP / protocol combination. The dip of the tuple will be modified here*/
    *tuple = *orig_tuple;
    find_best_ips_proto(zone, tuple, range, ct, maniptype);

    /* 3) The per-protocol part of the manip is made to map into
     * the range to make a unique tuple.
     */

    /* Only bother mapping if it's not already in range and unique */
    if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) {
        if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) {
            if (l4proto->in_range(tuple, maniptype,
                          &range->min_proto,
                          &range->max_proto) &&
                (range->min_proto.all == range->max_proto.all ||
                 !nf_nat_used_tuple(tuple, ct)))
                goto out;
        } else if (!nf_nat_used_tuple(tuple, ct)) {
            goto out;
        }
    }

    /* Last change: get protocol to try to obtain unique tuple. */
    /*We don't modify the port*/
    l4proto->unique_tuple(l3proto, tuple, range, maniptype, ct);
out:
    rcu_read_unlock();
}

/* For [FUTURE] fragmentation handling, we want the least-used
 * src-ip/dst-ip/proto triple.  Fairness doesn't come into it.  Thus
 * if the range specifies 1.2.3.4 ports 10000-10005 and 1.2.3.5 ports
 * 1-65535, we don't do pro-rata allocation based on ports; we choose
 * the ip with the lowest src-ip/dst-ip/proto usage.
 *Select a least used IP / Pro protocol combination
 */
static void
find_best_ips_proto(const struct nf_conntrack_zone *zone,
            struct nf_conntrack_tuple *tuple,
            const struct nf_nat_range *range,
            const struct nf_conn *ct,
            enum nf_nat_manip_type maniptype)
{
    union nf_inet_addr *var_ipp;
    unsigned int i, max;
    /* Host order */
    u32 minip, maxip, j, dist;
    bool full_range;

    /* No IP mapping?  Do nothing. */
    if (!(range->flags & NF_NAT_RANGE_MAP_IPS))
        return;

    if (maniptype == NF_NAT_MANIP_SRC)
        var_ipp = &tuple->src.u3;
    else
        Var_ipp = & tuple - > dst.u3; // set the destination IP, and obtain the destination IP address first

    /*Fast path: only one choice. If there is only one IP address, the IP address will be used. We have one IP, that is, the client IP*/
    if (nf_inet_addr_cmp(&range->min_addr, &range->max_addr)) {
        *var_ipp = range->min_addr;
        return;
    }

    if (nf_ct_l3num(ct) == NFPROTO_IPV4)
        max = sizeof(var_ipp->ip) / sizeof(u32) - 1;
    else
        max = sizeof(var_ipp->ip6) / sizeof(u32) - 1;

    /* Hashing source and destination IPs gives a fairly even
     * spread in practice (if there are a small number of IPs
     * involved, there usually aren't that many connections
     * anyway).  The consistency means that servers see the same
     * client coming from the same IP (some Internet Banking sites
     * like this), even across reboots.
     */
    j = jhash2((u32 *)&tuple->src.u3, sizeof(tuple->src.u3) / sizeof(u32),
           range->flags & NF_NAT_RANGE_PERSISTENT ?
            0 : (__force u32)tuple->dst.u3.all[max] ^ zone->id);

    full_range = false;
    for (i = 0; i <= max; i++) {
        /* If first bytes of the address are at the maximum, use the
         * distance. Otherwise use the full range.
         */
        if (!full_range) {
            minip = ntohl((__force __be32)range->min_addr.all[i]);
            maxip = ntohl((__force __be32)range->max_addr.all[i]);
            dist  = maxip - minip + 1;
        } else {
            minip = 0;
            dist  = ~0;
        }

        var_ipp->all[i] = (__force __u32)
            htonl(minip + reciprocal_scale(j, dist));
        if (var_ipp->all[i] != range->max_addr.all[i])
            full_range = true;

        if (!(range->flags & NF_NAT_RANGE_PERSISTENT))
            j ^= (__force u32)tuple->dst.u3.all[i];
    }
}

Carry out NAT construction for curr_tuple, replace the destination IP in curr_tuple, and get new_tuple:

dip      = 10.10.10.10
dport    = 10000
sip      = 2.2.2.2
sport    = 20
protocol = tcp
If (! NF Fu CT Fu tuple Fu equal (& new Fu tuple, & curr Fu tuple)) {// & new Fu tuple, & curr Fu tuple must not be equal,
        struct nf_conntrack_tuple reply;

        /* Alter conntrack table so will recognize replies. */
        /*Get a new quintuple of response direction from the new quintuple*/
        NF ﹐ CT ﹐ invert ﹐ tuplepr (& reply, & new ﹐ tuple); // invert the new ﹐ tuple to get reply
        /*Replace five tuples of response direction*/
        NF ﹣ conntrack ﹣ alter ﹣ reply (CT, & reply); // use reply as the response direction quintuple of the child connection CT

        /* Non-atomic: we own this at the moment. */
        if (maniptype == NF_NAT_MANIP_SRC)
            ct->status |= IPS_SRC_NAT;
        else
            CT - > status | = IPS | DST | // at the same time, set the destination NAT to be used. When the NAT module sees this flag, it will change the destination IP in the request direction to the source IP in the answer direction.
        /*Determine whether there is help for the connection. If there is help, you must add the SEQ adj extension function. The data channel does not have help, so you do not need to do seqadj * /
        if (nfct_help(ct) && !nfct_seqadj(ct))
            if (!nfct_seqadj_ext_add(ct))
                return NF_DROP;
    }

Reply:

dip      = 2.2.2.2 
dport    = 20
sip      = 10.10.10.10
sport    = 10000
protocol = tcp

At this time, reply can hit the response of client data channel, bingo!

At the same time, set the destination natct – > status | = IPS | DST | NAT. When the NAT module sees this flag, it will change the destination IP in the request direction to the source IP in the reply direction. Complete the message processing. After the response direction message sees the flag, it carries out the reverse operation of the destination NAT, and changes the source IP of the response message to the destination IP of the request direction.

Passive mode

The server sends XXX, XXX, XXX, XXX, PPP, PPP. The NAT device captures the message through the help function. This address is a public network address. NAT device does not need to perform NAT conversion for FTP content (note that NAT conversion is not required for content, and SNAT conversion is still required for control channel). NAT device will directly send the content of this message to the client. It is assumed that the command sent by the server is 2.2.2.2 10000. The NAT device must build the desired connection for the request direction of the child connection, that is:

dip/mask      = 2.2.2.2/0xffffffff
dport/mask    = 10000/0xffff
SIP / mask = rip / 0xFFFFFFFF // Rip is the source IP of the request direction of the main connection, that is, the client IP
Sport / mask = 0 / 0 // the source port is temporarily unknown
protocol = tcp

Suppose that the client connects with the server 2.2.2.2 10000 at 10.10.10 5000. Then NAT will create a new connection trace for the data channel as follows:

Request direction
dip      = 2.2.2.2
dport    = 10000
sip      = 10.10.10.10
sport    = 5000
protocol = tcp

Response direction:
dip      = 10.10.10.10
dport    = 5000
sip      = 2.2.2.2
sport    = 10000
protocol = tcp

After such a connection trace is created, it cannot hit the message in the response direction:

dip      = 1.1.1.1
dport    = 5000
sip      = 2.2.2.2
sport    = 10000
protocol = tcp

So you need to fix the quintuple of the response direction.

Where, the value of exp – > dir is set for the main connection, which is the reverse direction of the direction when the help function receives the response from the port sent by the server (the passive mode is the response direction, then the reverse direction is the request direction), so the value of exp – > dir is IP ﹣ CT ﹣ dir ﹣ original.

/* Setup NAT on this expected conntrack so it follows master. */
/* If we fail to get a free NAT slot, we'll get dropped on confirm */
void nf_nat_follow_master(struct nf_conn *ct,
              struct nf_conntrack_expect *exp)
{
    struct nf_nat_range range;

    /* This must be a fresh one. */
    BUG_ON(ct->status & IPS_NAT_DONE_MASK);

    /* Change src to where master sends to */
    Range.flags = NF ﹣ NAT ﹣ range ﹣ map ﹣ IPS; // source nat
    range.min_addr = range.max_addr
        =CT - > Master - > tuplehash [! Exp - > dir]. Tuple. DST. U3; // here is the answer direction. Select the destination IP
    NF NAT setup info (CT, & range, NF NAT manip SRC); // source nat

    /* For DST manip, map port here to where it's expected. */
    /*DNAT treatment*/
    range.flags = (NF_NAT_RANGE_MAP_IPS | NF_NAT_RANGE_PROTO_SPECIFIED);
    range.min_proto = range.max_proto = exp->saved_proto;
    range.min_addr = range.max_addr
        =CT - > Master - > tuplehash [! Exp - > dir]. Tuple. Src. U3; // the request direction source IP of the main connection is used here, that is, the client main link IP.
    NF? NAT? Setup? Info (CT, & range, NF? NAT? Manip? DST); // build NAT information
}

The latter reasoning is similar to the active mode.