Jump to content
Che

[C++] Get webpage as text ?

Recommended Posts

Posted

Salut !

Nu ma pricep la programare web in C++, stiu asa la nivel mediu C/C++ dar nu si C/C++ web sau Windows programming.

Tine minte ca facusem mai demult in VB.NET si nu stiu sigur cred ca si in C++ un programel care sa se conecteze la o baza de data MySQL. De atunci a trecut ceva timp.

Acuma as vrea, daca se poate si nu va deranjez acuma de Sarbatori, sa-mi scrieti un cod pentru un program in C/C++ care sa citeasca pagina unui site pentru a lua anumite date cu regex.

Va rog tare mult, daca se poate, sa comentati cu explicatii codul ca sa stiu ce anume face fiecare chestie de acolo (nu orice rand ca la noob ca stiu si eu ceva ceva insa in special partea de web programming.

Multumesc mult !

Posted

Salut,

De ce vrei s? faci cu C++? Doar ca s? înve?i limbajul sau vrei s? tragi zeci de mii de pagini?

Eu a? folosi un limbaj de scripting. Probabil bash + sed, sau python.

Posted
Salut !

Nu ma pricep la programare web in C++, stiu asa la nivel mediu C/C++ dar nu si C/C++ web sau Windows programming.

Tine minte ca facusem mai demult in VB.NET si nu stiu sigur cred ca si in C++ un programel care sa se conecteze la o baza de data MySQL. De atunci a trecut ceva timp.

Acuma as vrea, daca se poate si nu va deranjez acuma de Sarbatori, sa-mi scrieti un cod pentru un program in C/C++ care sa citeasca pagina unui site pentru a lua anumite date cu regex.

Va rog tare mult, daca se poate, sa comentati cu explicatii codul ca sa stiu ce anume face fiecare chestie de acolo (nu orice rand ca la noob ca stiu si eu ceva ceva insa in special partea de web programming.

Multumesc mult !

Vrei sa rulezi programul pe Windows?

Posted (edited)

#include <sys/socket.h>
#include <netinet/in.h>
#include <sys/select.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <sys/time.h>
#include <sys/types.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <netdb.h>
#define USERAGENT "JIHAD HTTP AGENT"


int socket_connect (char *hostme, int port)
{
struct timeval tv;
tv.tv_sec = 3;
tv.tv_usec = 0;
int error, handle;
struct hostent *host;
struct sockaddr_in server;

host = gethostbyname (hostme);
if (host != NULL){
handle = socket (AF_INET, SOCK_STREAM, 6);
if (handle == -1)
{
return handle;
}
else
{
server.sin_family = AF_INET;
server.sin_port = htons (port);
server.sin_addr = *((struct in_addr *) host->h_addr);
if (server.sin_addr.s_addr != -1 ){
bzero (&(server.sin_zero), 8);
setsockopt (handle, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv, sizeof(tv));
setsockopt (handle, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv, sizeof(tv));
error = connect (handle, (struct sockaddr *) &server,
sizeof (struct sockaddr));
if (error == -1) return handle;
}

}
} else printf("Invalid IP address.\n");
return handle;
}


int checkpage(char *hostname, char *page)
{
int socket_desc;
char buf[2000000];
char server_reply[2];
char message[2000];
socket_desc = socket_connect(hostname,80);

sprintf(message,"GET %s HTTP/1.1\r\nHost: %s\r\nUser-Agent: %s\r\nConnection: close\r\n\r\n", page, hostname, USERAGENT);
if( send(socket_desc , message , strlen(message), 0) < 0) { return 1; }
while(recv(socket_desc, server_reply , 1 , 0) > 0){
sprintf(buf,"%s%s",buf, server_reply);
}
printf(buf);
}


int main(int argc, char **argv)
{
checkpage(argv[1],argv[2]);
}

rulezi

./http yahoo.com /

Output:

root@superstars:~/new-proj# ./http yahoo.com /
HTTP/1.1 301 Redirect
Date: Fri, 26 Dec 2014 14:18:44 GMT
Via: http/1.1 ir7.fp.ne1.yahoo.com (ApacheTrafficServer)
Server: ATS
Location: https://www.yahoo.com//
Content-Type: text/html
Content-Language: en
Cache-Control: no-store, no-cache
Connection: keep-alive
Content-Length: 1450

<!DOCTYPE html>
<html lang="en-us"><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<title>Yahoo</title>
<meta name="viewport" content="width=device-width,initial-scale=1,minimal-ui">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<style>
html {
height: 100;
}
body {
background: #fafafc url(https://s.yimg.com/nn/img/sad-panda-201402200631.png) 50 50;
background-size: cover;
height: 100;
text-align: center;
font: 300 18px "helvetica neue", helvetica, verdana, tahoma, arial, sans-serif;
}
table {
height: 100;
width: 100;
table-layout: fixed;
border-collapse: collapse;
border-spacing: 0;
border: none;
}
h1 {
font-size: 42px;
font-weight: 400;
color: #400090;
}
p {
color: #1A1A1A;
}
#message-1 {
font-weight: bold;
margin: 0;
}
#message-2 {
display: inline-block;
*display: inline;
zoom: 1;
max-width: 17em;
_width: 17em;
}
</style>
</head>
<body>
<!-- status code : 301 -->
<!-- Error: GET -->
<table>
<tbody><tr>
<td>
<img src="https://s.yimg.com/nn/img/yahoo-logo-201402200629.png" alt="Yahoo Logo">
<h1 style="margin-top:20px;">Will be right back...</h1>
<p id="message-1">Thank you for your patience.</p>
<p id="message-2">Our engineers are working quickly to resolve the issue.</p>
</td>
</tr>
</tbody></table>


</body></html>

nu o sa-ti mearga SSL, ptr. ssl trebuie sa implementezi Openssl

codul e ptr. unix, il portezi usor ptr. winsock. doar la socket

nu stau sa ti-l explic, il vezi si incerci sa-l intelegi.

Edited by JIHAD
  • Upvote 1
Posted
Salut,

De ce vrei s? faci cu C++? Doar ca s? înve?i limbajul sau vrei s? tragi zeci de mii de pagini?

Eu a? folosi un limbaj de scripting. Probabil bash + sed, sau python.

Vreau cu C++ fiindca vreau sa programez de acum in colo numai in C++. Fara VB.NET, vb6 si alte prostii. C++ mi se pare cel mai misto dintre toate si cel mai logic, desi trebuie sa scrii mult mai mult.

Chestia asta face parte dintr-un proiect mult mai mare, care e tot in C++.

Vrei sa rulezi programul pe Windows?

Da.

@JIHAD

Multumesc !

Daca vreau sa iau din rezultat doar o anumite chestie cu regex, va trebui sa ma ocup de "printf(buf);" nu ?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...