This page clearly and engagingly explains what the Python script does, how it was designed, what its step-by-step workflow is, and where each part of the code fits. It also includes a window to display the full script directly inside the HTML page.
The goal of the script is to go through an Internet Archive collection, locate all the
links whose identifier starts with ajc, enter each concert detail page,
and extract structured information to save it in JSON format.
ajc.https://archive.org/details/ajc....og:title, datePublished, or AudioObject blocks.The script was designed in blocks so that each part had a clear mission and could be modified easily. This separation makes the code easier to understand and more robust.
ajc... identifiers.
Here you can paste the exact content of your scrap_adam.py file so the page displays the code inside an editor-style interface.
#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Paste the full content of your Python script here. # This window is intended to display the scraper inside the page. import json import os import re import time # ... rest of the code ...
ajc.meta property="og:title" and keeps the useful part of the title, before the extra Internet Archive text.AudioObject blocks and extracts the song name, its duration, and the associated MP3 link.{
"identifier": "ajc02179_dig-mandrakes-1987-09-18",
"url": "https://archive.org/details/ajc02179_dig-mandrakes-1987-09-18",
"concert": "Dig Mandrakes Live at Cabaret Metro 1987-09-18",
"artist": "Dig Mandrakes",
"publication_date": "1987-09-18",
"album_image": "https://archive.org/download/...JPEG",
"songs": [
{
"name": "Bury Your Love Like Treasure",
"duration": "03:10",
"duration_iso": "PT0M190S",
"mp3": "https://archive.org/download/...mp3"
}
]
}
explaining_scraper.html.scrap_adam.py and inserts it into the window using JavaScript, if the page is served from a local environment with the proper permissions.