Yan duplicate and Solutions

Time:2021-10-14

What is Yan duplicate

Students who use yarn as the package manager may find that different versions of a package will be packaged repeatedly during the construction of the app, even if these versions of the package are compatible.

For example, suppose the following dependencies exist:

Yan duplicate and Solutions

When (P) NPM is installed in the same module, judge whether the installed module version conforms to the version range of the new module. If so, skip it. If not, it will be in the node of the current module_ Install the module under modules. That is, lib-a will reuse the Lib that app depends on- [email protected]

However, using Yan V1 as the package manager, lib-a installs a separate copy of lib- [email protected]

Think about it. If the app project relies on lib-b @ ^ 1.1.0, is there no problem?

Yan duplicate and Solutions

When lib-b @ ^ 1.1.0 is installed in app, the latest version of lib-b is 1.1.0- [email protected] Will beyarn.lockLocked in.

If lib-a is installed after a period of time, and the latest version of lib-b is 1.2.0, Yan duplicate will still appear, so this problem is still common.

Although the company’s monorepo project has been migrated to rush and pnpm, many projects still use Yan as the underlying package management tool, and there is no migration plan.

For such projects, we can useyarn-deduplicateThis command line tool modifiesyarn.lockTo duplicate.

yarn-deduplicate — The Hero We Need

Basic use

Modify directly according to the default policyyarn.lock

npx yarn-deduplicate yarn.lock

Processing strategy

--strategy <strategy>

Highest strategy

By default, the maximum installed version will be used as much as possible.

Example 1, there are the followingyarn.lock:

[email protected]^1.0.0:
  version "1.0.0"

[email protected]^1.1.0:
  version "1.1.0"

[email protected]^1.0.0:
  version "1.3.0"

The modified results are as follows:

[email protected]^1.0.0, [email protected]^1.1.0:
  version "1.3.0"

Library @ ^ 1.0.0, library @ ^ 1.1.0 will be locked at 1.3.0 (the maximum version currently installed).

Example 2:

Change library @ ^ 1.1.0 to [email protected]

[email protected]^1.0.0:
  version "1.0.0"

[email protected]:
  version "1.1.0"

[email protected]^1.0.0:
  version "1.3.0"

The modified results are as follows:

[email protected]:
  version "1.1.0"

[email protected]^1.0.0:
  version "1.3.0"

[email protected] Unchanged, library @ ^ 1.0.0 is unified to the currently installed maximum version 1.3.0.

Fewer strategy

Try to use the least number of packages,Note that the minimum quantity is not the minimum version. If the installation quantity is the same, the maximum version is used

Example 1:

[email protected]^1.0.0:
  version "1.0.0"

[email protected]^1.1.0:
  version "1.1.0"

[email protected]^1.0.0:
  version "1.3.0"

The modified results are as follows:

[email protected]^1.0.0, [email protected]^1.1.0:
  version "1.3.0"

Note: andhighestThere is no difference in strategy

Example 2:

Change library @ ^ 1.1.0 to [email protected]

[email protected]^1.0.0:
  version "1.0.0"

[email protected]:
  version "1.1.0"

[email protected]^1.0.0:
  version "1.3.0"

The modified results are as follows:

[email protected]^1.0.0, [email protected]^1.1.0:
  version "1.1.0"

It can be found that using version 1.1.0 can minimize the installed version.

Progressive change

A shuttle is fast, but it may bring risks, so it needs to support gradual transformation.

--packages <package1> <package2> <packageN>

Specify a specific package

--scopes <scope1> <scope2> <scopeN>

Specify the package under a scope

Diagnostic information

--list

Output diagnostic information only

Analysis of yarn duplicate principle

Basic process

By looking at the package.json of yarn duplicate, you can find that the package depends on the following packages:

  • commanderComplete node.js command line solution;
  • @yarnpkg/lockfileParse or write the yarn.lock file;
  • semverThe semantic versioner for NPM can be used to judge whether the installed version meets the required version of package.json.

There are two main files in the source code:

  1. cli.js, command line related capabilities. Parse parameters and execute according to parametersindex.jsMethods in.
  2. index.js。 Main logic code.

Yan duplicate and Solutions

You can find the key points ingetDuplicatedPackages

Get Duplicated Packages

First, make it cleargetDuplicatedPackagesImplementation ideas of.

Assume the followingyarn.lock, the goal is to find out[email protected]^4.17.15ofbestVersion

[email protected]^4.17.15:
  version "4.17.21"

[email protected]:
  version "4.17.16"
  1. adoptyarn.lockAnalyzeloda[email protected]^4.17.15ofrequestedVersionby^4.17.15installedVersionby4.17.21
  2. Obtain satisfactionrequestedVersion(^4.17.15)All ofinstalledVersion, i.e4.17.21And4.17.16
  3. frominstalledVersionSelect those that meet the current policybestVersion(if the current policy isfewer, then[email protected]^4.17.15ofbestVersionby4.17.16, otherwise4.17.21)。

type definition

const getDuplicatedPackages = (
  json: YarnLock,
  options: Options
): DuplicatedPackages => {
  // todo
};

//Parse the object obtained by yarn.lock
interface YarnLock {
  [key: string]: YarnLockVal;
}

interface YarnLockVal {
  version: string; // installedVersion
  resolved: string;
  integrity: string;
  dependencies: {
    [key: string]: string;
  };
}

//Similar to this structure
const yarnLockInstanceExample = {
  // ...
  "[email protected]^4.17.15": {
    version: "4.17.21",
    resolved:
      "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz#679591c564c3bffaae8454cf0b3df370c3d6911c",
    integrity:
      "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==",
    dependencies: {
      "Fake-lib-x": "^ 1.0.0", // lodash actually has no dependencies
    },
  },
  // ...
};

//Parsed from command line arguments
interface Options {
  includeScopes: string[]; //  The packages under the specified scope are [] by default
  includePackages: string[]; //  Specify the packages to process. The default is []
  excludePackages: string[]; //  Specify packages that will not be processed. The default is []
  useMostCommon: boolean; //  This value is true when the policy is fewer
  includePrerelease: boolean; //  Whether to consider the package of the prerelease version. The default is false
}

type DuplicatedPackages = PackageInstance[];

interface PackageInstance {
  name: string; //  Package name, such as lodash
  bestVersion: string; //  Best version under current policy
  requestedVersion: string; //  Required version ^ 15.6.2
  installedVersion: string; //  Installed version 15.7.2
}

The ultimate goal is to obtainPackageInstance

obtainyarn.lockdata

const fs = require("fs");
const lockfile = require("@yarnpkg/lockfile");

const parseYarnLock = (file) => lockfile.parse(file).object;

//The file field is obtained from the command line parameter through the commander
const yarnLock = fs.readFileSync(file, "utf8");
const json = parseYarnLock(yarnLock);

Extract Packages

We need to filter out some packages according to the parameters in the specified range.

meanwhileyarn.lockAll keys in the object are[email protected]^4.17.15May also exist in the form of[email protected]Is the value of key, which is not easy to find data.

We can unify tolodashThe package name is key, the value is an array, and the array items are different version information for subsequent processing.

interface ExtractedPackage {
  [key: string]: {
    pkg: YarnLockVal;
    name: string;
    requestedVersion: string;
    installedVersion: string;
    satisfiedBy: Set<string>;
  };
}

interface ExtractedPackages {
  [key: string]: ExtractedPackage[];
}

satisfiedByIs used to store the package that meets this requirementrequestedVersionAll ofinstalledVersion, the default isnew Set()

Then, the set that satisfies the policy is retrieved from the setinstalledVersion, i.ebestVersion

The specific implementation is as follows:

const extractPackages = (
  json,
  includeScopes = [],
  includePackages = [],
  excludePackages = []
) => {
  const packages = {};
  //Regular matching yarn.lock object key
  const re = /^(.*)@([^@]*?)$/;

  Object.keys(json).forEach((name) => {
    const pkg = json[name];
    const match = name.match(re);

    let packageName, requestedVersion;
    if (match) {
      [, packageName, requestedVersion] = match;
    } else {
      //If there is no matching data, indicating that no specific version number is specified, it is *( https://docs.npmjs.com/files/package.json#dependencies )
      packageName = name;
      requestedVersion = "*";
    }

    //Filter out some packages according to the parameters in the specified range

    //If the scopes array is specified, only the packages under the relevant scopes are processed
    if (
      includeScopes.length > 0 &&
      !includeScopes.find((scope) => packageName.startsWith(`${scope}/`))
    ) {
      return;
    }

    //If packages are specified, only related packages are processed
    if (includePackages.length > 0 && !includePackages.includes(packageName))
      return;

    if (excludePackages.length > 0 && excludePackages.includes(packageName))
      return;

    packages[packageName] = packages[packageName] || [];
    packages[packageName].push({
      pkg,
      name: packageName,
      requestedVersion,
      installedVersion: pkg.version,
      satisfiedBy: new Set(),
    });
  });
  return packages;
};

After completing the extraction of packages, we need to supplement themsatisfiedByField and calculated from itbestVersion, i.e. implementationcomputePackageInstances

Compute Package Instances

Relevant types are defined as follows:

interface PackageInstance {
  name: string; //  Package name, such as lodash
  bestVersion: string; //  Best version under current policy
  requestedVersion: string; //  Required version ^ 15.6.2
  installedVersion: string; //  Installed version 15.7.2
}

const computePackageInstances = (
  packages: ExtractedPackages,
  name: string,
  useMostCommon: boolean,
  includePrerelease = false
): PackageInstance[] => {
  // todo
};

realizationcomputePackageInstancesIt can be divided into three steps:

  1. Get all of the current packageinstalledVersionInformation;
  2. supplementsatisfiedByField;
  3. adoptsatisfiedByCalculatedbestVersion

obtaininstalledVersioninformation

/**
 *Versions records the data of all installedversions of the current package
 *The satisfies field is used to store the requestedversion satisfied by the current installedversion
 *The initial value is new set()
 *Through the size of this field, you can analyze the installedversion that meets the largest number of requestedversions
 *Policy for fewer
 */
interface Versions {
  [key: string]: { pkg: YarnLockVal; satisfies: Set<string> };
}

//Dependency information corresponding to the current package name
const packageInstances = packages[name];

const versions = packageInstances.reduce((versions, packageInstance) => {
  if (packageInstance.installedVersion in versions) return versions;
  versions[packageInstance.installedVersion] = {
    pkg: packageInstance.pkg,
    satisfies: new Set(),
  };
  return versions;
}, {} as Versions);

supplementsatisfiedByAndsatisfiesfield

//Traverse all installedversions
Object.keys(versions).forEach((version) => {
  const satisfies = versions[version].satisfies;
  //Traverse packageinstance one by one
  packageInstances.forEach((packageInstance) => {
    //The installedversion of packageinstance must satisfy its requestedversion
    packageInstance.satisfiedBy.add(packageInstance.installedVersion);
    if (
      semver.satisfies(version, packageInstance.requestedVersion, {
        includePrerelease,
      })
    ) {
      satisfies.add(packageInstance);
      packageInstance.satisfiedBy.add(version);
    }
  });
});

according tosatisfiedByAndsatisfiescalculationbestVersion

packageInstances.forEach((packageInstance) => {
  const candidateVersions = Array.from(packageInstance.satisfiedBy);
  //Sort
  candidateVersions.sort((versionA, versionB) => {
    //If you use the feeder policy, sort according to the size of the 'satisfies' field in the current satisfiedby
    if (useMostCommon) {
      if (versions[versionB].satisfies.size > versions[versionA].satisfies.size)
        return 1;
      if (versions[versionB].satisfies.size < versions[versionA].satisfies.size)
        return -1;
    }
    //If you use the highest policy, use the highest version
    return semver.rcompare(versionA, versionB, { includePrerelease });
  });
  packageInstance.satisfiedBy = candidateVersions;
  packageInstance.bestVersion = candidateVersions[0];
});

return packageInstances;

Complete getduplicatedpackages

const getDuplicatedPackages = (
  json,
  {
    includeScopes,
    includePackages,
    excludePackages,
    useMostCommon,
    includePrerelease = false,
  }
) => {
  const packages = extractPackages(
    json,
    includeScopes,
    includePackages,
    excludePackages
  );
  return Object.keys(packages)
    .reduce(
      (acc, name) =>
        acc.concat(
          computePackageInstances(
            packages,
            name,
            useMostCommon,
            includePrerelease
          )
        ),
      []
    )
    .filter(
      ({ bestVersion, installedVersion }) => bestVersion !== installedVersion
    );
};

epilogue

This paper introduces yarn duplicate, leads to yarn duplicate as a solution, analyzes the internal implementation, and looks forward to the arrival of yarn v2.