Skip to content

Commit 5e1a757

Browse files
committed
Update the documentation
1 parent 9361c1c commit 5e1a757

File tree

2 files changed

+65
-10
lines changed

2 files changed

+65
-10
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
<h1>SeleniumBase</h1>
1111

12-
<p align="center"><a href="https://github.com/seleniumbase/SeleniumBase/"><img src="https://seleniumbase.github.io/cdn/img/super_logo_sb2.png" alt="SeleniumBase" title="SeleniumBase" width="350" /></a></p>
12+
<p align="center"><a href="https://github.com/seleniumbase/SeleniumBase/"><img src="https://seleniumbase.github.io/cdn/img/super_logo_sb3.png" alt="SeleniumBase" title="SeleniumBase" width="350" /></a></p>
1313

1414

1515
<p align="center" class="hero__title"><b>All-in-one Browser Automation Framework:<br />Web Crawling / Testing / Scraping / Stealth</b></p>
@@ -102,7 +102,7 @@ pytest test_demo_site.py
102102

103103
--------
104104

105-
<p align="left"><a href="https://github.com/seleniumbase/SeleniumBase/"><img src="https://seleniumbase.github.io/cdn/img/super_logo_sb2.png" alt="SeleniumBase" title="SeleniumBase" width="232" /></a></p>
105+
<p align="left"><a href="https://github.com/seleniumbase/SeleniumBase/"><img src="https://seleniumbase.github.io/cdn/img/super_logo_sb3.png" alt="SeleniumBase" title="SeleniumBase" width="232" /></a></p>
106106

107107
<blockquote>
108108
<p dir="auto"><strong>Explore the README:</strong></p>
@@ -1371,7 +1371,7 @@ pytest --reruns=1 --reruns-delay=1
13711371
13721372
<p><div><b><a href="https://github.com/mdmintz">https://github.com/mdmintz</a></b></div></p>
13731373
1374-
<div><a href="https://github.com/seleniumbase/SeleniumBase/"><img src="https://seleniumbase.github.io/cdn/img/super_logo_sb2.png" title="SeleniumBase" width="240" /></a></div>
1374+
<div><a href="https://github.com/seleniumbase/SeleniumBase/"><img src="https://seleniumbase.github.io/cdn/img/super_logo_sb3.png" title="SeleniumBase" width="240" /></a></div>
13751375
<div><a href="https://seleniumbase.io"><img src="https://img.shields.io/badge/docs-seleniumbase.io-11BBAA.svg" alt="SeleniumBase Docs" /></a></div> <div><a href="https://github.com/seleniumbase/SeleniumBase"><img src="https://img.shields.io/badge/tested%20with-SeleniumBase-04C38E.svg" alt="Tested with SeleniumBase" /></a></div> <div><a href="https://github.com/seleniumbase/SeleniumBase/blob/master/LICENSE"><img src="https://img.shields.io/badge/license-MIT-22BBCC.svg" title="SeleniumBase" /></a> <a href="https://gitter.im/seleniumbase/SeleniumBase" target="_blank"><img src="https://img.shields.io/gitter/room/seleniumbase/SeleniumBase.svg" alt="Gitter chat"/></a></div>
13761376
<div><a href="https://pepy.tech/project/seleniumbase" target="_blank"><img src="https://static.pepy.tech/badge/seleniumbase" alt="SeleniumBase PyPI downloads" /></a></div>
13771377
<div><a href="https://github.com/seleniumbase/SeleniumBase/stargazers"><img src="https://img.shields.io/github/stars/seleniumbase/seleniumbase.svg?color=19A57B" title="Stargazers" /></a></div>

help_docs/uc_mode.md

Lines changed: 62 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,21 @@
1919
from seleniumbase import Driver
2020

2121
driver = Driver(uc=True)
22-
driver.uc_open_with_reconnect("https://gitlab.com/users/sign_in", 3)
22+
url = "https://gitlab.com/users/sign_in"
23+
driver.uc_open_with_reconnect(url, 3)
2324
driver.quit()
2425
```
2526

27+
<img src="https://seleniumbase.github.io/other/gitlab_bypass.png" title="SeleniumBase" width="370">
28+
2629
👤 Here's an example with the <b><code translate="no">SB</code></b> manager (which has more methods and functionality than the <b><code translate="no">Driver</code></b> format):
2730

2831
```python
2932
from seleniumbase import SB
3033

3134
with SB(uc=True) as sb:
32-
sb.driver.uc_open_with_reconnect("https://gitlab.com/users/sign_in", 3)
35+
url = "https://gitlab.com/users/sign_in"
36+
sb.driver.uc_open_with_reconnect(url, 3)
3337
```
3438

3539
👤 Here's a longer example, which includes a retry if the CAPTCHA isn't bypassed on the first attempt:
@@ -55,9 +59,8 @@ with SB(uc=True, test=True) as sb:
5559
from seleniumbase import SB
5660

5761
def open_the_turnstile_page(sb):
58-
sb.driver.uc_open_with_reconnect(
59-
"https://seleniumbase.io/apps/turnstile", reconnect_time=3,
60-
)
62+
url = "seleniumbase.io/apps/turnstile"
63+
sb.driver.uc_open_with_reconnect(url, reconnect_time=2)
6164

6265
def click_turnstile_and_verify(sb):
6366
sb.switch_to_frame("iframe")
@@ -77,6 +80,46 @@ with SB(uc=True, test=True) as sb:
7780

7881
<img src="https://seleniumbase.github.io/other/turnstile_click.jpg" title="SeleniumBase" width="440">
7982

83+
👤 Here's an example <b>where the CAPTCHA appears after submitting a form</b>:
84+
85+
```python
86+
from seleniumbase import SB
87+
88+
with SB(uc=True, test=True, locale_code="en") as sb:
89+
url = "https://ahrefs.com/website-authority-checker"
90+
input_field = 'input[placeholder="Enter domain"]'
91+
submit_button = 'span:contains("Check Authority")'
92+
sb.driver.uc_open_with_reconnect(url, 1) # The bot-check is later
93+
sb.type(input_field, "github.com/seleniumbase/SeleniumBase")
94+
sb.driver.reconnect(0.1)
95+
sb.driver.uc_click(submit_button, reconnect_time=4)
96+
sb.wait_for_text_not_visible("Checking", timeout=10)
97+
sb.highlight('p:contains("github.com/seleniumbase/SeleniumBase")')
98+
sb.highlight('a:contains("Top 100 backlinks")')
99+
sb.set_messenger_theme(location="bottom_center")
100+
sb.post_message("SeleniumBase wasn't detected!")
101+
```
102+
103+
<img src="https://seleniumbase.github.io/other/ahrefs_bypass.png" title="SeleniumBase" width="540">
104+
105+
👤 Here, <b>the CAPTCHA appears after clicking to go to the sign-in screen</b>:
106+
107+
```python
108+
from seleniumbase import SB
109+
110+
with SB(uc=True, test=True, ad_block_on=True) as sb:
111+
url = "https://www.thaiticketmajor.com/concert/"
112+
sb.driver.uc_open_with_reconnect(url, 5.5)
113+
sb.driver.uc_click("button.btn-signin", 4)
114+
sb.switch_to_frame('iframe[title*="Cloudflare"]')
115+
sb.assert_element("div#success svg#success-icon")
116+
sb.switch_to_default_content()
117+
sb.set_messenger_theme(location="top_center")
118+
sb.post_message("SeleniumBase wasn't detected!")
119+
```
120+
121+
<img src="https://seleniumbase.github.io/other/ttm_bypass.png" title="SeleniumBase" width="540">
122+
80123
--------
81124

82125
👤 In <b translate="no">UC Mode</b>, <code translate="no">driver.get(url)</code> has been modified from its original version: If anti-bot services are detected from a <code translate="no">requests.get(url)</code> call that's made before navigating to the website, then <code translate="no">driver.uc_open_with_reconnect(url)</code> will be used instead. To open a URL normally in <b translate="no">UC Mode</b>, use <code translate="no">driver.default_get(url)</code>.
@@ -247,7 +290,7 @@ Here are the 3 primary things that <b translate="no">UC Mode</b> does to make bo
247290

248291
For example, if the <b translate="no">Chrome DevTools Console</b> variables aren't renamed, you can expect to find them easily when using <b><code translate="no">selenium</code></b> for browser automation:
249292

250-
<img src="https://seleniumbase.github.io/other/cdc_args.png" title="SeleniumBase" width="380">
293+
<img src="https://seleniumbase.github.io/other/cdc_args.png" title="SeleniumBase" width="390">
251294

252295
(If those variables are still there, then websites can easily detect your bots.)
253296

@@ -278,7 +321,7 @@ The above JS method is used within the <b><code translate="no">SeleniumBase</cod
278321

279322
🏆 <b>Choosing the right CAPTCHA service</b> for your business / website:
280323

281-
<img src="https://seleniumbase.github.io/other/me_se_conf.jpg" title="SeleniumBase" width="340">
324+
<img src="https://seleniumbase.github.io/other/me_se_conf.jpg" title="SeleniumBase" width="370">
282325

283326
As an ethical hacker / cybersecurity researcher who builds bots that bypass CAPTCHAs for sport, <b>the CAPTCHA service that I personally recommend</b> for keeping bots out is <b translate="no">Google's reCAPTCHA</b>:
284327

@@ -288,6 +331,18 @@ Since Google makes Chrome, Google's own <b translate="no">reCAPTCHA</b> service
288331

289332
--------
290333

334+
⚖️ <b>Legal implications of web-scraping</b>:
335+
336+
Based on the following article, https://nubela.co/blog/meta-lost-the-scraping-legal-battle-to-bright-data/, (which outlines a court case where social-networking company: Meta lost the legal battle to data-scraping company: Bright Data), it was determined that web scraping is 100% legal in the eyes of the courts as long as:
337+
1. The scraping is only done with <b>public data</b> and <b>not private data</b>.
338+
2. The scraping isn’t done while logged in on the site being scraped.
339+
340+
If the above criteria are met, then scrape away! (According to the article)
341+
342+
(Note: I'm not a lawyer, so I can't officially offer legal advice, but I can direct people to existing articles online where people can find their own answers.)
343+
344+
--------
345+
291346
<img src="https://seleniumbase.github.io/cdn/img/sb_text_f.png" alt="SeleniumBase" title="SeleniumBase" align="center" width="335">
292347

293348
<div><a href="https://github.com/seleniumbase/SeleniumBase"><img src="https://seleniumbase.github.io/cdn/img/sb_logo_gs.png" alt="SeleniumBase" title="SeleniumBase" width="335" /></a></div>

0 commit comments

Comments
 (0)