ToolNimba Browse

🔗 URL Extractor

By ToolNimba Text Team · Updated 2026-06-19

Separate by:
0
Unique results
0
Total found
0
Duplicates removed

Paste some text and press Extract URLs.

Need to pull every link out of a wall of text? Paste an article, email, log file, chat export or HTML source into the box and this URL extractor finds every web address inside it: full http and https links and bare www. addresses too. It removes duplicates, gives you a clean count and lists each result on its own line ready to copy. Flip on "Domains only" to collapse the list down to unique hostnames, which is perfect for building a quick allowlist, blocklist or outreach target list.

What is the URL Extractor?

A URL (Uniform Resource Locator) is the address of a resource on the web. Most links you meet start with a scheme like https://, but in plain text people also write bare addresses such as www.example.com or even example.com/page. Pulling these out by hand is slow and error prone, especially in long documents where the same link appears many times. A URL extractor automates the job: it scans the text with a pattern that recognises the shape of a web address, collects every match and hands you a tidy list.

This tool uses a pragmatic pattern that matches http:// and https:// links as well as addresses that begin with www. The match runs from the scheme up to the first whitespace or bracketing character, then trailing sentence punctuation (a full stop, comma, semicolon or closing punctuation) is trimmed so that a link written at the end of a sentence does not keep the period. Results are deduplicated case-insensitively, so the same link counted three times appears once, and the count of duplicates removed is shown so you can see how much noise was stripped.

The "Domains only" option reduces each URL to its host: it drops the scheme and keeps everything up to the first slash, question mark or hash, then lowercases it. So https://blog.example.com/post?id=9 and https://blog.example.com/about both become blog.example.com and collapse into one entry. A leading www. is treated as part of the host and kept, so www.blog.example.com stays www.blog.example.com. Because every step runs in your browser with plain JavaScript, nothing you paste is uploaded or stored anywhere, which matters when the text contains internal links or private notes.

When to use it

  • Pulling all outbound links out of an article, newsletter or email for a quick audit.
  • Building a list of unique domains from a research dump or a pile of search results.
  • Extracting URLs from a log file, chat export or raw HTML source to inspect or test them.
  • Collecting outreach or backlink targets, then using Domains only to dedupe to one row per site.
  • Cleaning up a messy paste of links into a single deduplicated, sortable, copy-ready list.

How to use the URL Extractor

  1. Paste or type the text containing links into the input box.
  2. Choose how to separate results: one per line (newline) or comma separated.
  3. Tick "Domains only" to reduce every link to its hostname, or leave it off for full URLs.
  4. Optionally tick "Sort A to Z" to alphabetise the list.
  5. Press Extract URLs, then use Copy results to grab the deduplicated list.

Formula & method

The extractor matches text of the form (https:// or http:// or www.) followed by characters up to the first space or bracket, then trims trailing sentence punctuation. Domains only keeps the host: drop the scheme, then take everything up to the first / or ? or # and lowercase it. Unique count = total matches minus duplicates removed (compared case-insensitively).

Worked examples

You paste: "Read more at https://example.com/guide and also https://example.com/guide, plus www.test.org."

  1. The pattern finds three matches: https://example.com/guide, https://example.com/guide, www.test.org
  2. Trailing punctuation is trimmed, so the final www.test.org loses its period
  3. Total found = 3
  4. Deduplicating case-insensitively leaves https://example.com/guide and www.test.org
  5. Unique = 2, duplicates removed = 1

Result: 2 unique URLs: https://example.com/guide and www.test.org, 1 duplicate removed.

Same text, but with "Domains only" turned on to collapse links to hostnames.

  1. Each match is reduced to its host: example.com, example.com, www.test.org
  2. The scheme and the /guide path are dropped, but the www. prefix is kept as part of the host
  3. Total found = 3
  4. Deduplicating the hosts leaves example.com and www.test.org
  5. Unique = 2 domains, duplicates removed = 1

Result: 2 unique domains: example.com and www.test.org, 1 duplicate removed.

What the extractor matches and how it handles each case

Input in the textMatched?Result (full URL mode)Result (domains only)
https://example.com/pageYeshttps://example.com/pageexample.com
http://sub.example.orgYeshttp://sub.example.orgsub.example.org
www.example.com/path?q=1Yeswww.example.com/path?q=1www.example.com
Visit example.com.No- (no scheme or www)-
ftp://files.example.comNo- (only http, https, www)-

Anatomy of a URL: the parts the tool reads

PartExampleUsed for domains only?
Schemehttps://Removed
Host (domain)blog.example.comKept
Path/postRemoved
Query?id=9Removed
Fragment#sectionRemoved

Common mistakes to avoid

  • Expecting bare domains with no scheme to be caught. A link written as example.com with no https:// and no www. prefix is not matched, because almost any word with a dot (file.txt, e.g.) would otherwise be picked up as a false positive. Add https:// or www. if you need such links found.
  • Assuming every scheme is supported. This tool targets web links: http, https and www. addresses. Other schemes such as ftp://, mailto: or tel: are intentionally ignored so the list stays focused on browseable URLs.
  • Forgetting that trailing punctuation is trimmed. A link that ends a sentence, like see https://example.com., has its final period removed so the URL is clean. This is usually what you want, but a real URL that genuinely ends in punctuation is rare and would also be trimmed.
  • Treating Domains only as a path keeper. Domains only deliberately strips the path, query and fragment. If you need the full address with its page path, leave the option unticked.

Glossary

URL
Uniform Resource Locator, the full web address of a page or resource, such as https://example.com/page.
Scheme
The prefix that says how to reach the resource, for example https:// or http://.
Domain (host)
The site name part of a URL, such as example.com or blog.example.com, without the path.
Path
The part after the host that points to a specific page, like /guide or /post.
Deduplicate
To remove repeated entries so each unique link or domain appears only once.
Query string
The part of a URL after a ? that passes parameters, such as ?id=9, dropped in domains only mode.

Frequently asked questions

What kinds of URLs does this tool extract?

It extracts full http:// and https:// links and bare addresses that start with www. These cover the vast majority of browseable web links in normal text. Bare domains with no scheme and no www, and non-web schemes like ftp:// or mailto:, are not matched.

Does it remove duplicate links?

Yes. After extraction the tool deduplicates the list case-insensitively, keeping the first form it sees, and shows you how many duplicates were removed. With Domains only on, it dedupes by hostname so the same site listed many times collapses to one row.

What does the Domains only option do?

It reduces each URL to its host. The scheme is dropped and everything from the first slash, question mark or hash onward is removed, then the host is lowercased. So https://blog.example.com/post?id=9 becomes blog.example.com. It is ideal for building a clean, deduplicated list of sites.

Is my pasted text uploaded anywhere?

No. The extraction runs entirely in your browser using plain JavaScript. Nothing you paste is sent to a server, logged or stored, so it is safe to use with private notes or internal links.

Why was a link at the end of a sentence missing its period?

The tool trims trailing sentence punctuation (period, comma, semicolon, colon, exclamation or question mark) so a link written at the end of a sentence comes out clean. This gives you a usable URL rather than one that ends in a stray punctuation mark.

Can I get the results as a comma separated list?

Yes. Choose the Comma option under Separate by and the results are joined with commas instead of one per line. You can also sort the list A to Z, then press Copy results to put it on your clipboard.