发现个自动化运维爬虫 的好东西seleniumbase

神仙玩意儿,基于selenium,方便管理使用。

https://seleniumbase.io/

pip install seleniumbase 安装好了第一次执行会从GoogleAPI自动下载对应的chrome driver ,特别适合部署再VPS上执行(启动无头模式 headless=True)

看代码:

from seleniumbase import SB

with SB(test=True, uc=True) as sb:
    sb.open("https://google.com/ncr")
    sb.type('[title="Search"]', "SeleniumBase GitHub page\n")
    sb.click('[href*="github.com/seleniumbase/"]')
    sb.save_screenshot_to_logs()  # ./latest_logs/
    print(sb.get_page_title())

卧槽 !你看到了吗:

sb.open()打开网页 ,

sb.click中使用js的选择器,

sb.type 就改变input标签的里面的信息。

牛的来了 还可以过cloudflare

from seleniumbase import SB

with SB(uc=True, test=True, locale="en") as sb:
    url = "https://gitlab.com/users/sign_in"
    sb.activate_cdp_mode(url)
    sb.uc_gui_click_captcha()

    sb.sleep(2)

看看这个行云流水的操作:

from seleniumbase import BaseCase
BaseCase.main(__name__, __file__)  # Call pytest

class MyTestClass(BaseCase):
    def test_swag_labs(self):
        self.open("https://www.saucedemo.com")
        self.type("#user-name", "standard_user")
        self.type("#password", "secret_sauce\n")
        self.assert_element("div.inventory_list")
        self.click('button[name*="backpack"]')
        self.click("#shopping_cart_container a")
        self.assert_text("Backpack", "div.cart_item")
        self.click("button#checkout")
        self.type("input#first-name", "SeleniumBase")
        self.type("input#last-name", "Automation")
        self.type("input#postal-code", "77123")
        self.click("input#continue")
        self.click("button#finish")
        self.assert_text("Thank you for your order!")
  • assert_element
  • assert_exact_text
  • assert_text
  • type
  • click
  • click_link
  • open
  • js_click
  • go_back
from seleniumbase import sb

with SB(test=True, uc=True) as sb:
    sb.open("https://el.psy.congroo.com")
    quotes = sb.find_elements("h1")
    for quote in quotes:
        print(quote.text)

无头浏览器:

from seleniumbase import SB

with SB(headless=True) as sb:
    sb.open("https://el.psy.congroo.com")
    quotes = sb.find_elements("h1")
    for quote in quotes:
        print(quote.text)
    sb.save_screenshot("webpage_screenshot.png")
    print("截图已保存为 webpage_screenshot.png")    

发表回复