Comparing URLs of organisations to find out if it's a duplicate

Info, an URL is formed like that: protocol://[subdomain]host[port][/pfad][#<fragment>][query]
[] <- means optional!

  1. protocol: is normally http oder https.

  2. subdomain: Beispiele -> https://naturzukunft.solidcommunity.net/ versus https://helmutwolman.solidcommunity.net

  3. host: e.g. facebook.com

  4. port: theoretically there can be such a thing: https://anbieter.de:9999/KundeA and https://anbieter.de:4444/KundeB

  5. path
    a. path with any tokens separated by /
    b. fragment is a chapter (or part) of a resource (e.g. HTML page) e.g. https://helmutwolman.solidcommunity.net/profile/card#me

  6. https://rdf.dev.osalliance.com/rdf4j-server/repositories/kvm?sparql=SELECT * WHERE { ?s ?p ?o . } LIMIT 10
    That goes already very into the detail! But I have seen that in KVM very complex URL’s are stored. Mostly they were Facebook domains!

@Helmut_Wolman Here you say:

Webseite : If they have the same URL-basis or

Let’s discuss this more in detail considering the info above

Of course, we have to ignore plattform-URLs like social media