正在查看 7 个帖子:1-7 (共 7 个帖子)
  • 作者
    帖子
  • @137657 回复 ⚑举报 

    崇鹂
    游客

    例如:

    jsg.aks.ac.kr/viewe...dataId=001

     

    搜狗截图20240511111719

    我之前都是F12,然后一页页由头点击到尾,由此获取到每一页的.xml文件,然后用dezoomify-rs批量拼接下载的。

    但这样搞一两册还行,搞几十册就浪费生命了,故咨询各位大佬

     

     

    @137658 回复 ⚑举报 

    xiaopengyou
    游客

    簡單填一下資料,就有自動發送圖書館自生成的全冊PDF

    搜一下交流區。

    @137660 回复 ⚑举报 

    崇鹂
    游客

    @xiaopengyou #137658

    请问是哪个帖子,没搜到。

    其实这个网站可直接下载黑白PDF,,但比拼接彩页的清晰度还是差远了

    @137661 回复 ⚑举报 

    未曾
    管理员

    @崇鹂 #137657

    那个xml前的名字和缩略图的名字是一样的。提取缩略图地址,批量替换一下就行~

    可以使用EmEditor提取对于url,复制源代码到EmEditor
    提取规则

    https://jsg.aks.ac.kr/jsgimg/thumb/(\d+)

    如图
    2024-05-11_114558

    然后将提取的结果:替换》批量替换,规则为

    查找

    https://jsg.aks.ac.kr/jsgimg/thumb/(\d+)

    替换为\1表示上面地址中正则的结果

    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/\1.xml

    如图
    2024-05-11_114422

    得到结果

    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080153270.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080202506.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080213590.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080237451.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080249436.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080303953.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080322863.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080342352.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080400338.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080411808.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080425841.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080443253.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080504275.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080516839.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080531262.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080543892.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080555628.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080611649.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080625134.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080642682.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080700972.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080713005.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080726710.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080740695.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080752180.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080806369.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080824392.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080836221.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080854807.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080913423.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080926659.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080944273.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106080955805.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081009198.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081020245.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081044718.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081057687.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081112631.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081128727.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081141040.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081154182.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081207199.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081218732.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081231141.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081243410.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081257427.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081355378.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081410710.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081432529.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081451264.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081506331.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081518129.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081529789.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081543305.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081600284.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081616160.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081703273.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081716227.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081735854.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081747652.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081805502.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081820034.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081831785.xml
    https://jsg.aks.ac.kr/jsgimg/data/images/2021/11/06/dzi/20211106081849526.xml

    然后可以使用批量下载的脚本获取

     

    @137663 回复 ⚑举报 

    xiaopengyou
    游客

    @崇鹂 #137660

    我都是用下面這個帖的方法,因我下的不多,也不需要高清,或供參考

    www.shuge.org/meet/...post-94488

    @137664 回复 ⚑举报 

    崇鹂
    游客

    @未曾 #137661

    收到

    @xiaopengyou #137663

    收到

     

    @137726 回复 ⚑举报 

    小透明
    游客

    用python 获取 xml 列表比较简单,但我不理解怎么用 xml文件这个下载图片

    代码如下,替换相应的书籍地址,运行成功会在代码所在文件夹,生成系统时间命名的txt文件。

    from bs4 import BeautifulSoup
    import re
    import requests
    import json
    import datetime
    
    #替换相应的 书籍地址即可
    url ="https://jsg.aks.ac.kr/viewer/viewIMok?dataId=K3-325%7C001#node?depth=2&upPath=001&dataId=001"
    
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    # find all <script>标签
    script_tags = soup.find_all('script')
    
    img_id = []
    # 遍历<script>标签
    for script in script_tags:
        # 提取<script>标签中的文本内容
        script_content = script.text
        # 用正则表达 做判断
        match = re.search(r'var dataJSon =', script_content)
        
        if match:
            m = re.search(r'imgItems: (\[.*?\])', script_content, re.MULTILINE)
            if not m:
                continue
            data_json_str = m.group(1)
            # 制作json格式
            data_json_str = '{"imgItems": ' + data_json_str + '}'
            data_json_str = json.loads(data_json_str)
    
    img_items = data_json_str['imgItems']
    
    dict_img = []
    for item in img_items:
        imgid = item['imgID']
        dzi = imgid[0:4] + '/' + imgid[4:6] + '/' + imgid[6:8] +'/dzi/'+imgid+'.xml'
        url_file = 'https://jsg.aks.ac.kr/jsgimg/data/images/'+ dzi
        #print(url_file)
        dict_img.append(url_file)
    
    # 获取当前系统时间
    current_time = datetime.datetime.now()
    # 格式化时间
    file_name = current_time.strftime("%Y%m%d-%H%M") + ".txt"
    
    with open(file_name, 'w', encoding='utf-8') as file:
        file.writelines(dict_img)
    
    print("文件已写入:", file_name)

     

正在查看 7 个帖子:1-7 (共 7 个帖子)
正在查看 7 个帖子:1-7 (共 7 个帖子)

上传图片

拖拽或点击选择图片(最多五张)

回复至:请问“韩国学中央研究院”的书页如何批量下载?
您的信息:



发帖/回帖前,请了解相关版规

0,邮箱地址尽量真实有效,随意填写的可能会被系统误判为垃圾内容。
1,不要开书单。单个帖子尽量发布一种书籍需求。
2,在搜索不到相关主题的情况下,尽量发新帖(发帖标题最好带上书名)。不要在他人帖子中回复某种书籍需要。
3,发帖提问标题尽量简单明了。发帖内容不要太过简略,请对书籍内容、版本或作者作简要说明。
4,出版于1973年以后的资源需求或分享将会被清理删除。