用Python读取sitemap并调用百度接口推送URL
用Python读取sitemap并调用百度接口推送URL
February 1, 2021
SEO对于网站的推广很重要,大多数搜索引擎都提供了一些API用于给站长主动提交URL,加快网页被收录的速度。
百度提供了快速收录的API接口,下面这个Python脚本可以用来读取本地磁盘中的sitemap.xml文件,并调用接口提交URL至百度。
仅需要修改下面的参数:
- lastUpdateTimeStr - 上次推送的时间。会与sitemap.xml中的时间做比较,仅推送在该时间之后更新的URL
- siteMapPath - sitemap.xml在本地磁盘上的存放路径
- siteUrl - 网站地址
- baiduApiToken - Baidu API的token
- tmpFile - 临时文件的保存地址
- ignorePathPrefixes - 需要忽略的URL的前缀
#!/usr/bin/env python3
# coding: utf-8
import xml.etree.ElementTree as ET
from datetime import datetime
import os
### Methods #########
def stripNs(el):
# Recursively search this element tree, removing namespaces.
if el.tag.startswith("{"):
el.tag = el.tag.split('}', 1)[1] # strip namespace
for k in el.attrib.keys():
if k.startswith("{"):
k2 = k.split('}', 1)[1]
el.attrib[k2] = el.attrib[k]
del el.attrib[k]
for child in el:
stripNs(child)
### Arguments to change ####
lastUpdateTimeStr='2021-01-26T00:00:00+08:00'
siteMapPath='public/sitemap.xml'
siteUrl='https://www.zengxi.net'
baiduApiToken='faketoken'
tmpFile="/tmp/submitSiteMap"
ignorePathPrefixes=[
'https://www.zengxi.net/archives/',
'https://www.zengxi.net/categories/',
'https://www.zengxi.net/links/',
'https://www.zengxi.net/posts/',
'https://www.zengxi.net/series/',
'https://www.zengxi.net/tags/'
]
### CONSTANTS ###
SITEMAP_DATETIME_FORMAT='%Y-%m-%dT%H:%M:%S%z'
lastUpdateTime=datetime.strptime(lastUpdateTimeStr, SITEMAP_DATETIME_FORMAT)
tree = ET.parse(siteMapPath)
urlset = tree.getroot()
with open(tmpFile, 'w') as f:
for url in urlset:
location = ''
lastmod = lastUpdateTime
for urlChild in url:
stripNs(urlChild)
if urlChild.tag == 'loc':
location = urlChild.text
elif urlChild.tag == 'lastmod':
lastmod = datetime.strptime(urlChild.text, SITEMAP_DATETIME_FORMAT)
ignore = False
for prefix in ignorePathPrefixes:
if location.startswith(prefix):
ignore = True
break
if ignore:
continue
if lastmod >= lastUpdateTime:
f.write(location + '\n')
command="""
curl -H 'Content-Type:text/plain' --data-binary @{filePath} "http://data.zz.baidu.com/urls?site={siteUrl}&token={token}"
"""
commandToExecute=command.format(filePath=tmpFile, siteUrl=siteUrl, token=baiduApiToken)
tmpres = os.popen(commandToExecute).readlines()
print(commandToExecute)
print(tmpres)
最后更新于