VaD_SuNeTe Posted June 27, 2013 Report Share Posted June 27, 2013 Salut, din cate va spune titlul am si eu nevoie de ajutor...Vreau sa stiu ce este un clawler cu ce se mananca si cum se creeaza .Multumesc , astept raspunsuri serioase. Quote Link to comment Share on other sites More sharing options...
eusimplu Posted June 27, 2013 Report Share Posted June 27, 2013 Depinde foarte mult de cat de complex este siteul, cat de complexe sunt url-urile.La url-urile simple, cu ?moloz=1 trebuie sa stii la ce numar se opreste sau ce apare cand se opreste.La url-urile complicate, cu site.tld/moloz/nume , unde nume este de exemplu titlul articolului va fi necesar inainte sa folosesti un alt crawler si sa inregistrezi fiecare pagina in baza de date. Un exemplu: crawler pentru categorii, crawler pentru paginile articolelor si de-abia la sfarsit crawlerul pentru articole.Acum ca am lamurit cum facem cu url-urile trebuie sa le si „vizitezi” cumva. Pentru asta poti folosii doua functii:file_get_contents('http//pagina-de-vizitat.tdl'); // cea mai usoaraPHP: file_get_contents - Manualsi curl - cea grea, PHP: cURL - ManualApoi folosesti regexp si preg_match_all sa iei ce doresti din sursa returnata de pagina vizitata cu curl sau file_get_contents Quote Link to comment Share on other sites More sharing options...
Birkoff Posted June 27, 2013 Report Share Posted June 27, 2013 ia vezi daca te ajuta Cautare in site Quote Link to comment Share on other sites More sharing options...
FarSe Posted June 28, 2013 Report Share Posted June 28, 2013 (edited) <?php //include('adf.php');header("Content-type: text/plain");include("config.php");$ch = curl_init();curl_setopt ($ch, CURLOPT_POST, 1);curl_setopt ($ch, CURLOPT_COOKIEJAR, 'Vivacookie.txt');curl_setopt ($ch, CURLOPT_COOKIESESSION, 1);curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);//curl_setopt ($ch, CURLOPT_VERBOSE, 1);curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt ($ch, CURLOPT_BINARYTRANSFER, 1);//curl_setopt ($ch, CURLOPT_HEADER, 1);//LOGINcurl_setopt ($ch, CURLOPT_URL, 'http://vivaprograms.com/index.php');curl_setopt ($ch, CURLOPT_POSTFIELDS, 'login_name=USER&login_password=PAROLA&login=submit');$creap=curl_exec ($ch);curl_setopt ($ch, CURLOPT_URL, 'http://vivaprograms.com/index.php');$creap=curl_exec ($ch);//echo $creap;$total = 0;for($pg = 50;$pg<3500;$pg++){ curl_setopt ($ch, CURLOPT_URL, 'http://vivaprograms.com/page/'.$pg.'/'); $data=curl_exec ($ch); if(!preg_match_all('$base shortstory.*?<a href="http://vivaprograms\.com/(nulled-scripts|webmasters|templates|seo-tools|resources)/([0-9]+)-([a-z0-9-/]+)\.html">(.*?)</a></h3>.*?<div id="news-id-[0-9]+" style="display:inline;">(.*?)</div>\r\n\t\t\t<div class="clr"></div>$is',$data,$contz)){ echo $data; break;} for($c = 0;$c<count($contz[0]);$c++) { if(mysql_num_rows(mysql_query("SELECT id FROM posts WHERE source=2 AND sid='".$contz[2][$c]."';")))continue; $total ++; $Post = array(); $Post['category']=str_replace('nulled-','',$contz[1][$c]); $Post['sid']=$contz[2][$c]; $Post['url']=mysql_real_escape_string($contz[3][$c]); $Post['title']=mysql_real_escape_string($contz[4][$c]); $Post['smaldesc']=mysql_real_escape_string($contz[5][$c]); //echo $contz[0][$c]; curl_setopt ($ch, CURLOPT_URL, 'http://vivaprograms.com/'.$Post['category'].'/'.$Post['sid'].'-'.$Post['url'].'.html'); $data=curl_exec ($ch); if(preg_match('$Author:.*?http://vivaprograms\.com/user/([a-zA-Z0-9-/]+)/.*?http://vivaprograms.com/([0-9]+/[0-9]+/[0-9]+)/.*?<div id="news-id-[0-9]+" style="display:inline;">(.*?)</div> <br /> <center>$is',$data,$dataz)) { $Post['author']=mysql_real_escape_string($dataz[1]); $Post['postdate']=str_replace('/','-',$dataz[2]).' '.rand(10,24).':'.rand(10,60).':'.rand(10,60); $Post['content']=mysql_real_escape_string($dataz[3]); }else echo 'FUAUUU'; //mysql_query("INSERT INTO posts VALUES(NULL,2,'{$Post['sid']}','{$Post['category']}','{$Post['title']}','{$Post['url']}','".rand(100,400)."',NOW(),'{$Post['postdate']}','{$Post['smaldesc']}','{$Post['content']}','{$Post['author']}');"); if(mysql_errno())echo mysql_error(); echo "Page $pg : $total : $c ,{$Post['category']} title {$Post['title']}\n"; //break; } //break;}curl_close($ch);?>Uite un exemplu de la crawlerele scriptwarez.net Edited June 28, 2013 by FarSe Quote Link to comment Share on other sites More sharing options...
VaD_SuNeTe Posted June 28, 2013 Author Report Share Posted June 28, 2013 Multumesc pentru raspunsuri , sunt folositoare. Quote Link to comment Share on other sites More sharing options...
UnixDevel Posted June 28, 2013 Report Share Posted June 28, 2013 exista cateva clase destul de ok pentru crawling : printre care ,phpcrawler si simple html doom Quote Link to comment Share on other sites More sharing options...
dsp77 Posted June 28, 2013 Report Share Posted June 28, 2013 Eu folosesc simplehtmldom. Poti selecta elementele dintr-o pagina web ca JQuery. Foloseste ceva resurse dar este usor de utilizat.Alta soltie ar fi folosirea functia nativa PHP DOM ceva mai complex Quote Link to comment Share on other sites More sharing options...
yoyois Posted June 28, 2013 Report Share Posted June 28, 2013 clawler = taratorEste cum ii spune si numele ceva ce sa taraste. Are scopul de a se mista pe pagini web, furnizand detalii despre acestea.Crawlerele sunt folosite de site-turile de cautare pentru a indexa pagini web sau de boti, cu scopul de a obtine continut din pagini web.Practic un crawler se taraste pe pagini web, cautand/scanand continutul.De aici deriva mai multe clase de crawlere cu diferite comportamente.Cele care urmaresc continutul folosite de exemplu ca sa obtii toate bancurile de pe un site de bancuri (fara sa le copii manual).Cele Pentru cautare si indexare: care urmaresc elemente importante din site (titlu, h1, strong) si urmaresc linkurile(le pot accesa creeand un site-map) (ex: crawlerele GOOGLE).etc. Quote Link to comment Share on other sites More sharing options...
shaggi Posted June 28, 2013 Report Share Posted June 28, 2013 uite un mini exemplu de crawler: <?php $RST = file_get_contents("https://rstforums.com"); $RST = explode(" <div - Pastebin.com Quote Link to comment Share on other sites More sharing options...
VaD_SuNeTe Posted June 28, 2013 Author Report Share Posted June 28, 2013 uite un mini exemplu de crawler: <?php $RST = file_get_contents("https://rstforums.com"); $RST = explode(" <div - Pastebin.comLa site-ul auto.ro , sa preaia datele , marca, descriere,poza,pret,titlu anunt,model si sa le puna intro baza de date simpla. Quote Link to comment Share on other sites More sharing options...