Jump to content
cmiN

[Python] RedTube link crawler [console]

Recommended Posts

Pentru labatroni :D si cei care vor sa dea embed la p0rnache.

[Python] rtc - Pastebin.com

#! /usr/bin/env python3
# redtube.com video link crawler
# 11.05.2011

import sys, urllib.request, threading

class RTC(threading.Thread):
def __init__(self, page):
self.page = page
threading.Thread.__init__(self)
def run(self):
word = b"stat['"
with urllib.request.urlopen("http://www.redtube.com/?page=" + str(self.page)) as uin:
source = uin.read()
start = source.find(b"<!-- Preparation templates -->")
end = source.find(b"<!-- Preparation templates end -->")
start = source.find(word, start, end)
while start != -1:
start = start + len(word)
string = ""
while source[start] != ord("'"):
string += chr(source[start])
start += 1
self.fobj.write("http://www.redtube.com/" + string + "\n")
start = source.find(word, start, end)

def main(argc, argv):
if argc == 5:
RTC.fobj = open(argv[1], "at")
for page in range(int(argv[2]), int(argv[3]) + 1):
while threading.active_count() > int(argv[4]):
pass
RTC(page).start()
while threading.active_count() > 1:
pass
RTC.fobj.close()
else:
print("./rtc.py filename from to threads\n\
filename: path to a text file for storing links\n\
from: start page number\n\
to: end page number\n\
threads: how many pages to parse asinchronously\n")

if __name__ == "__main__":
main(len(sys.argv), sys.argv)

Link to comment
Share on other sites

Am mai auzit si eu de teoria asta, probabil adopta stilul de la C, acolo directivele nu le poti scrie pe aceeasi linie iar ca viteza mi se pare exact la fel chiar mai rapid (teoretic) de ce sa am N import cand pot avea unul singur la fel cum de ce sa am print("c") print("m")...etc cand pot avea print("c\nm\ni\nN") ?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...