devtools Module

This module contains the developer tools for both users and library contributors.

Examples

Collect captcha.

from amazoncaptcha import AmazonCaptchaCollector

output_folder_path = 'path/to/folder'
simultaneous_processes = 4
target = 200

collector = AmazonCaptchaCollector(output_folder_path)
collector.start(target, simultaneous_processes)

Proceed accuracy tests.

from amazoncaptcha import AmazonCaptchaCollector

output_folder_path = 'path/to/folder'
simultaneous_processes = 4
target = 200

collector = AmazonCaptchaCollector(output_folder_path, accuracy_test=True)
collector.start(target, simultaneous_processes)

The AmazonCaptchaCollector Class

class amazoncaptcha.devtools.AmazonCaptchaCollector(output_folder_path, keep_logs=True, accuracy_test=False)[source]
__init__(output_folder_path, keep_logs=True, accuracy_test=False)[source]

Initializes the AmazonCaptchaCollector instance.

Parameters:
  • output_folder (str) – Folder where images or logs should be stored.
  • keep_logs (bool, optional) – If set to True, unsolved captcha links will be stored separately.
  • accuracy_test (bool, optional) – If set to True, AmazonCaptchaCollector will not download images but just solve them and log the results.
_distribute_collecting(milestone)[source]

Distribution function for multiprocessing.

_extract_captcha_id(captcha_link)[source]

Extracts a captcha id from a captcha link.

Parameters:captcha_link (str) – A link to the captcha image.
Returns:Captcha ID.
Return type:str

Extracts a captcha link from an html page.

Parameters:captcha_page (str) – A page’s html in string format.
Returns:Captcha link.
Return type:str
get_captcha_image()[source]

Requests the page with Amazon’s captcha, gets random captcha. Creates AmazonCaptcha instance, stores an original image before solving.

If it is not an accuracy test, the image will be stored in a specified folder with the solution within its name. Otherwise, only the logs will be stored, mentioning the captcha link being processed and the result.

start(target, processes)[source]

Starts the process of collecting captchas or conducting a test.

Parameters:
  • target (int) – Number of captchas to be processed.
  • processes (int) – Number of simultaneous processes.