🔗 URL Extractor

By ToolNimba Text Team · Updated 2026-06-19

Paste your text

Separate by: Newline Comma Domains only Sort A to Z

Unique results

Total found

Duplicates removed

Extracted URLs

Paste some text and press Extract URLs.

Need to pull every link out of a wall of text? Paste an article, email, log file, chat export or HTML source into the box and this URL extractor finds every web address inside it: full http and https links and bare www. addresses too. It removes duplicates, gives you a clean count and lists each result on its own line ready to copy. Flip on "Domains only" to collapse the list down to unique hostnames, which is perfect for building a quick allowlist, blocklist or outreach target list.

What is the URL Extractor?

A URL (Uniform Resource Locator) is the address of a resource on the web. Most links you meet start with a scheme like https://, but in plain text people also write bare addresses such as www.example.com or even example.com/page. Pulling these out by hand is slow and error prone, especially in long documents where the same link appears many times. A URL extractor automates the job: it scans the text with a pattern that recognises the shape of a web address, collects every match and hands you a tidy list.

This tool uses a pragmatic pattern that matches http:// and https:// links as well as addresses that begin with www. The match runs from the scheme up to the first whitespace or bracketing character, then trailing sentence punctuation (a full stop, comma, semicolon or closing punctuation) is trimmed so that a link written at the end of a sentence does not keep the period. Results are deduplicated case-insensitively, so the same link counted three times appears once, and the count of duplicates removed is shown so you can see how much noise was stripped.

The "Domains only" option reduces each URL to its host: it drops the scheme and keeps everything up to the first slash, question mark or hash, then lowercases it. So https://blog.example.com/post?id=9 and https://blog.example.com/about both become blog.example.com and collapse into one entry. A leading www. is treated as part of the host and kept, so www.blog.example.com stays www.blog.example.com. Because every step runs in your browser with plain JavaScript, nothing you paste is uploaded or stored anywhere, which matters when the text contains internal links or private notes.

When to use it

Pulling all outbound links out of an article, newsletter or email for a quick audit.
Building a list of unique domains from a research dump or a pile of search results.
Extracting URLs from a log file, chat export or raw HTML source to inspect or test them.
Collecting outreach or backlink targets, then using Domains only to dedupe to one row per site.
Cleaning up a messy paste of links into a single deduplicated, sortable, copy-ready list.

How to use the URL Extractor

Paste or type the text containing links into the input box.
Choose how to separate results: one per line (newline) or comma separated.
Tick "Domains only" to reduce every link to its hostname, or leave it off for full URLs.
Optionally tick "Sort A to Z" to alphabetise the list.
Press Extract URLs, then use Copy results to grab the deduplicated list.

Formula & method

The extractor matches text of the form (https:// or http:// or www.) followed by characters up to the first space or bracket, then trims trailing sentence punctuation. Domains only keeps the host: drop the scheme, then take everything up to the first / or ? or # and lowercase it. Unique count = total matches minus duplicates removed (compared case-insensitively).

Worked examples

You paste: "Read more at https://example.com/guide and also https://example.com/guide, plus www.test.org."

The pattern finds three matches: https://example.com/guide, https://example.com/guide, www.test.org
Trailing punctuation is trimmed, so the final www.test.org loses its period
Total found = 3
Deduplicating case-insensitively leaves https://example.com/guide and www.test.org
Unique = 2, duplicates removed = 1

Result: 2 unique URLs: https://example.com/guide and www.test.org, 1 duplicate removed.

Same text, but with "Domains only" turned on to collapse links to hostnames.

Each match is reduced to its host: example.com, example.com, www.test.org
The scheme and the /guide path are dropped, but the www. prefix is kept as part of the host
Total found = 3
Deduplicating the hosts leaves example.com and www.test.org
Unique = 2 domains, duplicates removed = 1

Result: 2 unique domains: example.com and www.test.org, 1 duplicate removed.

What the extractor matches and how it handles each case

Input in the text	Matched?	Result (full URL mode)	Result (domains only)
https://example.com/page	Yes	https://example.com/page	example.com
http://sub.example.org	Yes	http://sub.example.org	sub.example.org
www.example.com/path?q=1	Yes	www.example.com/path?q=1	www.example.com
Visit example.com.	No	- (no scheme or www)	-
ftp://files.example.com	No	- (only http, https, www)	-

Anatomy of a URL: the parts the tool reads

Part	Example	Used for domains only?
Scheme	https://	Removed
Host (domain)	blog.example.com	Kept
Path	/post	Removed
Query	?id=9	Removed
Fragment	#section	Removed

Common mistakes to avoid

Expecting bare domains with no scheme to be caught. A link written as example.com with no https:// and no www. prefix is not matched, because almost any word with a dot (file.txt, e.g.) would otherwise be picked up as a false positive. Add https:// or www. if you need such links found.
Assuming every scheme is supported. This tool targets web links: http, https and www. addresses. Other schemes such as ftp://, mailto: or tel: are intentionally ignored so the list stays focused on browseable URLs.
Forgetting that trailing punctuation is trimmed. A link that ends a sentence, like see https://example.com., has its final period removed so the URL is clean. This is usually what you want, but a real URL that genuinely ends in punctuation is rare and would also be trimmed.
Treating Domains only as a path keeper. Domains only deliberately strips the path, query and fragment. If you need the full address with its page path, leave the option unticked.

Glossary

URL: Uniform Resource Locator, the full web address of a page or resource, such as https://example.com/page.
Scheme: The prefix that says how to reach the resource, for example https:// or http://.
Domain (host): The site name part of a URL, such as example.com or blog.example.com, without the path.
Path: The part after the host that points to a specific page, like /guide or /post.
Deduplicate: To remove repeated entries so each unique link or domain appears only once.
Query string: The part of a URL after a ? that passes parameters, such as ?id=9, dropped in domains only mode.

Frequently asked questions

What kinds of URLs does this tool extract?

It extracts full http:// and https:// links and bare addresses that start with www. These cover the vast majority of browseable web links in normal text. Bare domains with no scheme and no www, and non-web schemes like ftp:// or mailto:, are not matched.

Does it remove duplicate links?

Yes. After extraction the tool deduplicates the list case-insensitively, keeping the first form it sees, and shows you how many duplicates were removed. With Domains only on, it dedupes by hostname so the same site listed many times collapses to one row.

What does the Domains only option do?

It reduces each URL to its host. The scheme is dropped and everything from the first slash, question mark or hash onward is removed, then the host is lowercased. So https://blog.example.com/post?id=9 becomes blog.example.com. It is ideal for building a clean, deduplicated list of sites.

Is my pasted text uploaded anywhere?

No. The extraction runs entirely in your browser using plain JavaScript. Nothing you paste is sent to a server, logged or stored, so it is safe to use with private notes or internal links.

Why was a link at the end of a sentence missing its period?

The tool trims trailing sentence punctuation (period, comma, semicolon, colon, exclamation or question mark) so a link written at the end of a sentence comes out clean. This gives you a usable URL rather than one that ends in a stray punctuation mark.

Can I get the results as a comma separated list?

Yes. Choose the Comma option under Separate by and the results are joined with commas instead of one per line. You can also sort the list A to Z, then press Copy results to put it on your clipboard.