WGet command usage

Time:2021-6-11

WGet command usage

1. Usage / command format

wget  [OPTION]...  [URL]...
WGet [parameter list] [address of target software and web page]

The parameters required for the long option are also required when using the short option

2. Common parameters

  • Start up:
-5. -- version # displays the version information of WGet and exits
-h. -- help # print help
-b. -- background # after starting, it will be transferred to the background
  • Records and input files:
-o. -- output file = file # write log information to [file]
-a. -- append output = file # add information to [file]
-q. -- quiet # quiet mode (no information output)
-v. -- verbose # detailed output (default)
-NV, - no verb # turns off detailed output, but does not enter quiet mode
-i. -- input file = file # Download URL in local or external [file]
  • Download:
-t. -- tries = number # set the number of retries to [number] (0 means unlimited)
     --Retry connrefused # try again even if the connection is refused
-O. -- output document = file # write the document to [file] (it can be understood as renaming the downloaded file to [file])
-NC, - no clobber # do not download existing files
-c. -- continue # continue to download files (continue to get some downloads)
-N. -- timestamping # only get files that are newer than local files
-S. -- server response # print server response header information
     --Spider does not download any files
     --Limit rate = rate # limit the download rate to [rate]
     --Ignore case # ignore case when matching files / directories
     --User = user # set FTP and HTTP user names to [user]
     --Password = pass # set both FTP and HTTP passwords to pass
     --Ask password # prompt for password
  • catalog:
-Nd, - no directories # - do not create directory
-x. -- force directories ා force directory creation
-NH, - no host directories # do not create a host( www.cnglogs.com )Contents
     --Protocol directories # use protocol name in directory (create directory from HTTPS)
-P. -- directory prefix = prefix # save the file to the specified [prefix] directory
     --Cut dirs = number # ignore [number] directory layers in remote directory.
  • HTTP options:
--HTTP user = user # set the HTTP user name to [user]
--HTTP password = pass # set the HTTP password to pass
--No cache does not use the data cached by the server.
--Default page = name # change the default page name (usually "index. HTML")
--No cookies no cookies
--Save cookies = file # save cookies to [file] after the session ends
  • HSTs options:
--No HSTs # disable HSTs
--HSTs file # HSTs database path (will override default)
  • FTP options:
--FTP user = user # set FTP user name to [user]
--FTP password = pass # set FTP password to [pass]
--No glob # expand without wildcards in FTP file names
--Preserve permissions ා reserve permissions for remote files
--Retr symlinks # get linked files (not directories) when recursing directories
  • Recursive Download:
-r. -- recursive # specify recursive Download
-l. -- level = number # maximum recursive depth (INF or 0 means unlimited, that is, all downloads).
     --Delete after # delete local files after download
     --Backups = n # rotate up to n backup files before writing to file X
-K. -- backup converted # back up the file x as x.orig before converting it
  • Recursive accept / reject:
-A. -- accept = list # comma separated list of acceptable extensions
-R. -- reject = list # comma separated list of extensions to reject
     --Accept regex = regex # regular expression matching accepted URL
     --Reject regex = regex # regular expression matching rejected URL
     --Regex type = type # regular type (POSIX | PCRE)
-D. -- domains = list # comma separated list of acceptable domain names
     --Exclude domains = list # comma separated list of domain names to reject
-1. -- include directories = list # list of allowed directories
-10. -- exclude directories = list # list of excluded directories
-NP, - no parent # does not trace back to the parent directory

3. Common examples

  • Download single file / Web page
wget  https://www.rarlab.com/rar/rarlinux-6.0.1.tar.gz 	#  Download rarlinux-6.0.1.tar.gz
wget  https://www.rarlab.com/download.htm 		#  Download the download.htm web page
  • Change the downloaded file name to the specified file name (parameter “O”)
wget -O edit.html https://i.cnblogs.com/posts/editpostId=14660645
#The default download and save file is "editpostid = 14660645"
#After using the - O parameter, the saved file is the specified file name, here is "edit. HTML"
  • Breakpoint download (parameter “C”)
#This parameter is suitable for downloading large files and the network speed is not ideal
#With the help of parameter "C", you can continue to download from the place where the file is interrupted

wget -c https://www.rarlab.com/rar/rarlinux-6.0.1.tar.gz
  • Background download (parameter “B”)
#For downloading large files, we can use the parameter "B" to switch the process to the background download
#After switching the background download, we can view the download progress through "WGet log"
#The - t parameter indicates the number of retries. For example, if 100 retries are required, write - t 100. If it is set to - t 0, it means infinite retries until the connection is successful
wget -b https://www.rarlab.com/rar/rarlinux-6.0.1.tar.gz
  • Batch download (parameter “I”)
#Customize a file urllist.txt, input all the URLs that need to be downloaded, and then use the parameter "I" to specify to change the file
wget -i URLlist.txt
  • Check whether the web page is accessible without downloading (“s”: print response information, “spaider”: do not download)
wget [-S] --spaider https://www.cnblogs.com/cure/p/14660645.html
  • Specify the file format to download (a specifies the file format to download, R specifies to ignore the file format to download)
wget -A png  https://www.cnblogs.com/  	 Or WGet -- accept = list https://www.cnblogs.com/
wget -R gif  https://www.cnblogs.com/  	 Or WGet -- reject = list https://www.cnblogs.com/
#List means that multiple formats can be specified
  • Specify user name and password to download
#This scenario is suitable for partial access to URL downloads that require user name and password verification
wget --user=USER --password=PASS  https://www.cnblogs.com/cure/p/14660645.html 	 #  In this way, the password is displayed in clear text
wget --user=USER --ask-password  https://www.cnblogs.com/cure/p/14660645.html 	 #  In this way, the password will be prompted to enter after pressing enter, and the password will not be displayed (recommended)
  • Download all files for the current location in the URL
#Scene: I need to download all the files in a certain path of the file server, but I don't need to save the home page "index. HTML"

wget -r -np -nd -R html,tmp  https://www.cnblogs.com/cure/p/
Or:
wget -r -np -nd -A txt,zip,png[...]  https://www.cnblogs.com/cure/p/

#Parameter introduction:
#- R recursive download (it's better to keep up with the NP parameter, otherwise the data of the whole website and associated website will be downloaded)
#- NP does not trace back to the parent (only the current location is accessed)
#- nd does not create a folder
#- R ignore / reject the download format (WGet download will download the home page "index. HTML" by default, and will save it as HTML. TMP file when the HTML format is rejected)
#- a specifies the download format (only the files in the specified format will be downloaded, so the home page "index. HTML" will not be downloaded)

reference resources:

https://www.jianshu.com/p/59bb131bc2ab