[QUESTION] Scraping The Schema Website

RogerM

New Member
Platinum Member
Bronze Star
Joined
Apr 30, 2025
Messages
20
Reaction Score
46
Feedback
0 / 0 / 0
Hey guys, as the title says, I am interested in scraping all the schema types and properties from the schema website. I want to build my own schema builder leveraging the power of AI.

Do you know if somebody has already done that? Is there like a repository somewhere on the internet I can take advantage of?

I tried to search for it myself but had no luck.

Thanks in advance...cheers!
 
Never done this but I asked chatgpt and it gave me a script to retrieve all the types and properties in python, not sure it work, never tried it.

Code:
from rdflib import Graph, Namespace

# Load the RDF graph
g = Graph()
g.parse("http://schema.org/version/latest/schemaorg-current-http.rdf", format="xml")

# Define the Schema.org namespace
SCHEMA = Namespace("http://schema.org/")

# Extract all classes (types)
for s in g.subjects(predicate=g.namespace_manager.qname("rdf:type"), object=SCHEMA["Class"]):
    print(f"Type: {s}")

    # Extract properties for each type
    for p in g.subjects(predicate=SCHEMA["domainIncludes"], object=s):
        print(f"  Property: {p}")

This script loads the latest Schema.org RDF file and prints out each type along with its associated properties.
 
@Dopious I loaded the script in Colab and besides missing the !pip statement to load the main library, it runs smoothly but nothing gets downloaded on my end.

fb5ecf985bd46d67ec44361041a4ad94.png
 
youtube, facebook, bhw, oo, roger always stings me with something i didnt know about...


i appreciate you big dawg
 
Back
Top