Jump to content
popdms

Python - extrag email din fisier text

Recommended Posts

Buna!

Am tot cautat pe forumuri o solutie pentru a extrage email-uri dintr-un fisier text.

Programelul l-am scris in python si am folosit regular expresions (Regex).

Reusesc sa extrag doar email-uri care nu contin spatii, dar nu am reusit sa extrag un email de genul: name2 @ email . com

Am nevoie de codul care extrage emailul gasit scris atat corect (fara spatii) cat si cu spatii (cum am exemplificat mai sus).

Pun mai jos exemplu de fisier text si codul pe care l-am scris:

 

xxx test1@gmail.com xxxx
xxxxxxxx
xxx test2 @ email . com xxx
xxxxxxx
xxx name1.name2.mm@email.co.uk xxx
xxxxxxxxxxx

 

 

Codul:
 

import re

a = open('emails.txt')
for line in a:
    line = line.rstrip()
    if re.search(r'[\w.-]+@[\w.-]+',line):
        z = re.findall(r'[\w.-]+@[\w.-]+',line)
        print (z[0])

Returneaza doar:

test1@gmail.com
name1.name2.mm@email.co.uk

Edited by popdms
Link to comment
Share on other sites

Cred ca cea mai ok solutie este sa folosesti grupuri atunci cand cauti un email si sa ignori grupurile care pot sau nu sa contina spatii.

 

Alternativ, poti folosi un editor de text separat sa elimini spatiile mai intai.

De exemplu, search & replace in Notepad++:

- find: ( ?)(@)( ?)(\w+)( ?)(\.)( ?)

- replace: $2$4$6

Asta va scoate toate spatiile din jurul @ si .

Link to comment
Share on other sites

Incearca cu asta:

(([\w.-]+\s*[.-]+\s*)+\w+|\w+)\s*@\s*([\w.-]+\s*[.-]+\s*)+\w+

 

Testat cu: http://sprunge.us/dGTb?py

 

In grupurile inainte de "@" ce contin ".-", poti pune si alte caractere speciale gen: !#$%&'*+-/=?^_`{|}~  dar vezi sa le escapezi inainte. Mai multe detalii la https://en.wikipedia.org/wiki/Email_address -> local part.

  • Upvote 1
Link to comment
Share on other sites

Uite ceva in VB care facea ce doresti tu, iti arat si tie, poate te ajuta:

 

 Private Function GetMails(ByVal source As String) As String()
        Dim Valid As MatchCollection
        Dim n As Integer
        ListBox1.Items.Clear()

        Valid = Regex.Matches(source, "([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})")
        Dim results(Valid.Count - 1) As String
        For n = 0 To results.Length - 1
            ListBox1.Items.Add(Valid(n).Value)
        Next


        Return results
    End Function

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...