popdms Posted February 8, 2016 Report Share Posted February 8, 2016 (edited) Buna! Am tot cautat pe forumuri o solutie pentru a extrage email-uri dintr-un fisier text. Programelul l-am scris in python si am folosit regular expresions (Regex). Reusesc sa extrag doar email-uri care nu contin spatii, dar nu am reusit sa extrag un email de genul: name2 @ email . com Am nevoie de codul care extrage emailul gasit scris atat corect (fara spatii) cat si cu spatii (cum am exemplificat mai sus). Pun mai jos exemplu de fisier text si codul pe care l-am scris: xxx test1@gmail.com xxxx xxxxxxxx xxx test2 @ email . com xxx xxxxxxx xxx name1.name2.mm@email.co.uk xxx xxxxxxxxxxx Codul: import re a = open('emails.txt') for line in a: line = line.rstrip() if re.search(r'[\w.-]+@[\w.-]+',line): z = re.findall(r'[\w.-]+@[\w.-]+',line) print (z[0]) Returneaza doar: test1@gmail.com name1.name2.mm@email.co.uk Edited February 8, 2016 by popdms Quote Link to comment Share on other sites More sharing options...
dooma Posted February 8, 2016 Report Share Posted February 8, 2016 (edited) Cand vine vorba de validare de email e mai complex. Ai aici explicatii + o expresie regulara la final. Daca sapi mai bine, este un site unde se compara diferite expresii regulare pentru email-uri. http://stackoverflow.com/a/201378/1886542 Edited February 8, 2016 by dooma Add small words Quote Link to comment Share on other sites More sharing options...
Eminemu Posted February 9, 2016 Report Share Posted February 9, 2016 Cred ca cea mai ok solutie este sa folosesti grupuri atunci cand cauti un email si sa ignori grupurile care pot sau nu sa contina spatii. Alternativ, poti folosi un editor de text separat sa elimini spatiile mai intai. De exemplu, search & replace in Notepad++: - find: ( ?)(@)( ?)(\w+)( ?)(\.)( ?) - replace: $2$4$6 Asta va scoate toate spatiile din jurul @ si . Quote Link to comment Share on other sites More sharing options...
cmiN Posted February 9, 2016 Report Share Posted February 9, 2016 Incearca cu asta: (([\w.-]+\s*[.-]+\s*)+\w+|\w+)\s*@\s*([\w.-]+\s*[.-]+\s*)+\w+ Testat cu: http://sprunge.us/dGTb?py In grupurile inainte de "@" ce contin ".-", poti pune si alte caractere speciale gen: !#$%&'*+-/=?^_`{|}~ dar vezi sa le escapezi inainte. Mai multe detalii la https://en.wikipedia.org/wiki/Email_address -> local part. 1 Quote Link to comment Share on other sites More sharing options...
Meteosensibilul Posted February 9, 2016 Report Share Posted February 9, 2016 Uite ceva in VB care facea ce doresti tu, iti arat si tie, poate te ajuta: Private Function GetMails(ByVal source As String) As String() Dim Valid As MatchCollection Dim n As Integer ListBox1.Items.Clear() Valid = Regex.Matches(source, "([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})") Dim results(Valid.Count - 1) As String For n = 0 To results.Length - 1 ListBox1.Items.Add(Valid(n).Value) Next Return results End Function Quote Link to comment Share on other sites More sharing options...