驗證碼識別基礎(chǔ)方法及源碼_第1頁
驗證碼識別基礎(chǔ)方法及源碼_第2頁
驗證碼識別基礎(chǔ)方法及源碼_第3頁
驗證碼識別基礎(chǔ)方法及源碼_第4頁
驗證碼識別基礎(chǔ)方法及源碼_第5頁
已閱讀5頁,還剩8頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

先說說寫這個的背景最近有朋友在搞一個東西,已經(jīng)做的挺不錯了,最后想再完美一點,于是乎就提議把這種驗證碼給K.O.了,于是乎就K.O.了這個驗證碼。達到單個圖片識別時間小于200ms,500個樣本人工統(tǒng)計正確率為95%。由于本人沒有相關(guān)經(jīng)驗,是摸著石頭過河。本著經(jīng)驗分享的精神,分享一下整個分析的思路。在各位大神面前獻丑了。再看看部分識別結(jié)果8FAE2捫QOQO-4MPQ-4MPOO1--8FAE-.8FAE.jpOT2-YG^H-¥G9H.DD3-MYAV-MYA004-2B7Q-2B7Q.JQ-jipggjpgVjpgpgx3nC況譏G4叱H£4S83EJW5-X3NC-X3NC.O06-RWTU-FLWT007-GAWQ-GAWOTS-HE45-HE45.J009-83EJ-B.3EJ.jpgJpgUjpqQjpgP9ekcg45B3P3K7F^701O-EKCG-EKC&,jD11-45B3-45Bl.jp012-劌HZ9¥4Zj013-PSKT-PSKT.J014-FWA7-FWA7,P9gpgP9jpgJG2FP4CBxMS巧016-Q4PD-Q4PD.D17-G2FP-G2FP.J01S-4CBX-4CBX.Jfll^-MSTT-MSTT.9jpgP9P9jpg94RY6^vXTKPDyUC.OiO-WRV-'MRV.jp021-&BAV-&BAV.J022-NPNC-NPN024燈KP-XTKP.j024-DVU匚-嘰C,gP9c-jP9pgjpgSBUEF浮V9/3b025-RVWS-RVW5.OQfr-SBUE-SBUE.j027-VKWK-VKWK028-FQ9V-F-Q9V.J029-9J3U-gJ5U.jp」pgpg■jpgpgg7^76PV7B85DGU便(13fl-7KYT-7KYT.j051-B7GG-B7GG.jD52-6PVT-&PVT.jCG3-B35D-B35D.J0B4-GUKS-GUIC3.jpgpgpgpgpg是不是看著很眼熟?處理第一步,去背景噪音和二值化對于這一塊,考慮了幾種方法。方法一,統(tǒng)計圖片顏色分布,顏色占有率低的判定為背景噪音。由于背景噪音和前景色區(qū)分并不明顯,嘗試了很多種取景方法都不能很好去除背景噪音,最終放棄了這種方法。方法二,事后在網(wǎng)上稍微查了下,最近比較流行計算灰度后設(shè)定一個閾值進行二值化。其實所謂的灰度圖片原理是根據(jù)人眼對色彩敏感度取了權(quán)值,這個權(quán)值對計算機來說沒有什么意義。稍微想一下就可以發(fā)現(xiàn),這兩個過程完全可以合并。于是乎我一步完成了去背景噪音和二值化。閾值設(shè)置為RGB三分量之和到500。結(jié)果非常令人滿意。

OMpQ8FAEYG%2^7qDDO.bmpD01.bmpOOZ.bmpC03-.bmpOD4.bmpx3nCG4^HE4S83EJD05.bmp005.bmp007.bmpOO&.bmpOO9.bmpE?45B39丫皿p3KjFJ07D10.bmpD11.bmp012.bmpD13-.bmpD14.bmpJB電/PEG2FP4CBxMSTjD15.bmpD16.bmp017.bmpD18.bmpD19.bmp94RY帚2xtkpdyuc020.bmp021.bmp022.bmpO23-.bmp024.bmpfiyTSSSBUEF汐V9j3b025.bmp026.bmpC27.bmp028.bmp029.bmp7^7B7G06PV7B85DGU建D30.bmpD31.bmpQ3Z.bmpD33-.bmpQ34.bmp549q^U535J4QD35.bmpD36.bmpQ37.bmp038.bmpQ39.bmpBeF^叱8E^9PAP040.bmp041.bmpO4Z.bmpMi.brripO44.bmp處理第二步,制作字符樣本樣本對于計算機來說是非常重要的,因為計算機很難有邏輯思維,就算有邏輯思維也要經(jīng)過長期訓(xùn)練才能讓你滿意。所以要用事先制作好的樣本進行比較。如果你仔細觀察過這些驗證碼會發(fā)現(xiàn)一個bug,幾乎大部分的驗證碼都是使用同樣的字體,于是乎就人工制作了一套字體的樣本。由于上一步已經(jīng)有去除背景噪音的結(jié)果,可以直接利用。制作樣本這一步有點簡單枯燥,還需要細心。可能因為你的一個不細心會導(dǎo)致某個符號的識別率偏低。在這500個樣本中,只發(fā)現(xiàn)了31個字符。幸虧是某部門的某人員還考慮到了易錯的字符,例如,1和I,0和O等。要不然這個某部門要背負更多的罵名。處理第三步,匹配單個匹配用了最簡單最原始的二值比較,不過匹配的是匹配率而不是匹配數(shù)。我定義了相關(guān)的計分原則。大原則是“該有的有了加分,該有的沒了減分,不該有的有了適度減分,可達區(qū)域外的不算分”。由于一些符號的部分區(qū)域匹配結(jié)果跟另一些符號的完整匹配結(jié)果相似,需要把單個匹配在一個擴大的區(qū)域擇優(yōu)。在一定的圍,找到一個最佳匹配,這個最佳匹配就是當前位置對應(yīng)的符號。完成了一次最佳匹配,可以把匹配位置向右推進一大步,若找不到合適的最佳匹配就向右推進一小步。處理第四步,優(yōu)化和調(diào)整任何一個算法都是需要優(yōu)化和調(diào)整的?,F(xiàn)在要找到最佳參數(shù)配置和最佳代碼組織這一步往往是需要花費最多時間和精力的。處理第五步,驗證結(jié)果這一步呢,純?nèi)肆︱炞C結(jié)果,統(tǒng)計出正確率。思考結(jié)果是出來了,代碼也不多,效果也很理想。搞這一行的,很多時候都想要通用的能否通用,很大程度上在于抽象層次。本方法只是單純的匹配,自然不能通用,但是方法和思想?yún)s是通用的。具體案例具體分析。至于扭曲文字、空心文字等,處理要復(fù)雜的多。網(wǎng)上也有一些使用第三方圖像庫的方法,也許那些方法會比較通用。等有空了有興趣了繼續(xù)搞一下這個主題。源碼至于這個源碼要不要發(fā)布,糾結(jié)了一段時間。網(wǎng)上已經(jīng)有類似的商業(yè)活動了,而且這個識別本身沒有太大難度,再加上某系統(tǒng)天生的bug,此驗證碼本身就相當于沒有設(shè)置,因此發(fā)布此代碼,僅作于學(xué)習(xí)交流。+ViewCode?1234567891011121314151617123456789101112131415161718usingSystem.Collections.Generic;usingSystem.Drawing;usingSystem.IO;usingSystem.IO.Compression;namespaceCrackl2306Captcha{publicclassCracker{List<CharInfo>words_=newList<Charlnfo>();publicCracker。{0x00,0x00,0x04,0x00,0x97,0x2f,0xe1,0x58,varbytes=newbyte[]{0x1f,0x8b,0x08,0x00,0x00,0xc5,0x58,0xd9,0x92,0x13,0x31,0x0c,0x94,0x9e,0x93,0x0c,0xe0,0x91,0x9b,0x82,0x62,0x0b,0x58,0xee,0xff,0xff,0x10,0x00,0x61,0xd8,0xcc,0xc8,0xea,0x96,0x6c,0x8f,0x13,0x48,0xel,0xaa,0x4d,0x46,0x96,0x6d,0xb5,0x8e,0x96,0x67,0x73,0x7f,0x3b,0x09,0x0e,0x25,0x41,0x49,0xa3,0xae,0xd7,0x5b,0xa9,0xa8,0xd5,0xb4,0x76,0x02,0x6a,0x5c,0x52,0x94,0x54,0xed,0x18,0x5a,0x7f,0x18,0x00,0x00,TOC\o"1-5"\h\z0x84, 0x07, 0x1b, 0x80, 0x4a, 0x9a, 0x08, 0x35, 0xb8, 0x81,0x50,0xe7,0xad,0xbe,0xc4,0x8e,0xb1, 0x4f, 0x2d, 0x5f, 0xba, 0x80, 0xbb, 0xfd, 0x9a, 0xad,0x19,0x36,0xe5,0xad,0x87,0xf1,0x10, 0xc0, 0x8d, 0xc6, 0x50, 0x40, 0x52, 0xf8, 0xb3, 0x98,0x2c,0xd6,0xec,0x59,0xe7,0x0d,0x3e,0x0f,0x93,0x3e,0x1d,0x02,0x7a,0x18,0x8f,0xb6,0xc7,0x46,0x4e,0x01,0xa3,0x96,0xdc,0x3a,0x20,0x77,0xbf,0x2c,0x24,0xe4,0x80,0xa9,0x20,0x14,0xe5,0x2d,0xb5,0x68,0xc9,0x55,0x89,0x23,0x96,0x82,0xaa,0xba,0x58,0xa6,0x03,0x38,0x71,0x4b,0x29,0xd2,0x47,0x80,0xe3,0x84,0x91,0xf4,0x78,0x43,0x64,0x41,0x7b,0x73,0x99,0x80,0x42,0x48,0x00,0xde,0x00,0x12,0x88,0x80,0xdb,0x51,0x4a,0x49,0x84,0x43,0xf6,0x51,0x90,0x27,0x21,0xc9,0xf8,0xac,0x00,0x4d,0xcd,0x46,0x09,0x9d,0x15,0x78,0xe0,0x00,0x1e,0x44,0x2a,0x51,0x8c,0xbc,0xd3,0xa3,0x68,0x8a,0xd5,0x3a,0x20,0x79,0xba,0x4d,0x71,0x4c,0x0b,0x91,0x98,0x90,0x7b,0x2a,0x42,0xc5,0x78,0x7a,0xfc,0xd5,0x1b,0x4b,0x09,0xa7,0x27,0x99,0x38,0x05,0x01,0xc2,0x80,0x39,0x9c,0x67,0xbb,0x4e,0x7f,0x6c,0x33,0xdd,0xed,0x87,0x55,0xda,0x5d,0xb5,0x56,0x33,0xc6,0xf9,0xea,0x60,0x64,0xcf,0xa7,0x41,0xe0,0x5c,0x1c,0xc4,0xb2,0x25,0xa3,0x89,0x88,0x8d,0x16,0x00,0xb5,0xed,0xa5,0x22,0x9d,0x52,0x41,0x53,0x8d,0x92,0x7f,0x31,0x51,0x3f,0xa8,0x00,0x85,0x8a,0x71,0x10,0x92,0x78,0xc4,0x59,0x08,0x39,0x69,0xa9,0x38,0x41,0x48,0xf7,0x40,0x5a,0x03,0xd5,0x3a,0xf5,0xe5,0x9d,0x33,0x66,0xc3,0xd7,0x1f,0xef,0x94,0xa0,0x53,0xea,0xf4,0x15,0xb2,0x1c,0x40,0x2d,0xcf,0xaf,0xce,0xe9,0xd4,0x7a,0x89,0x09,0xe6,0xdd,0xdb,0x0e,0xb8,0x58,0xa7,0x60,0x37,0xfd,0xf2,0xfa,0x2c,0x4e,20212223242526272829303132333435363738394041424344454647484950515253545556575859606162630x51,0x87,0x0d,0xfc,0x16,0x72,6465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061070x2a,0x5f,0xc0,0x80,0xf0,0x54,0xa7,0xde,0xfc,0x15,0x8b,0x9a,0x36,0x3a,0x2c,0x62,0xfc,0xd4,0x8c,0x31,0xb7,0xea,0xd7,0x26,0xc4,0xaf,0x75,0xea,0xdb,0x8b,0xff,0x9b,0x9b,0x50,0x7e,0xfe,0x15,0xab,0x17,0x2f,0x96,0x96,0xbd,0xaa,0x87,0xdd,0x77,0xa3,0x77,0xd3,0x85,0xf0,0xe0,0x58,0xd5,0xf6,0x8c,0xcd,0xc4,0x63,0x52,0x12,0x48,0x46,0x0f,0x93,0x5a,0xe3,0xea,0x24,0x67,0x73,0x63,0xa0,0xdf,0xdf,0x3d,0x67,0xf6,0xa9,0xfc,0xed,0x08,0xe3,0x82,0x57,0x08,0x35,0x47,0x68,0x9c,0x01,0x40,0x87,0x8b,0xbd,0x0c,0xb3,0xf4,0xe1,0x72,0xd7,0x54,0x62,0xfd,0x40,0xed,0x99,0xa6,0x7e,0x2b,0xe4,0xb4,0xc4,0x62,0x0d,0x79,0xae,0x1b,0xd7,0xf4,0x09,0xb7,0xe1,0x7c,0x44,0x09,0x9a,0xda, 0xff, 0x52, 0x6a, 0x3c, 0xe1, 0xc8, 0xd7, 0xbd, 0xbb,0xbe,0x37,0xfc,0xd6,0xd5,0x4e,0x3c, 0x40, 0x2a, 0x4b, 0x39, 0x1a, 0xbd, 0x2a, 0xcd, 0xc1,0x18,0x59,0x40,0x62,0x78,0xec,0x63, 0x19, 0x72, 0xf0, 0xcf, 0xf8, 0x38, 0xfa, 0x42, 0x3a,0xc8,0x02,0xec,0x5b,0xeb,0x8d,0xae,0xf1,0x45,0xdd,0x32,0x98,0x35,0x3c,0x9f,0xa6,0x3d,0xce,0x13,0xce,0x94,0x38,0x87,0x00,0x8d,0x85,0xc4,0x70,0x17,0x26,0x0e,0xa6,0x1e,0x16,0xcb,0xbf,0x52,0xdf,0x29,0x63,0xc4,0xf6,0x8c,0x35,0xba,0xf2,0xf9,0x1f,0xbf,0x73,0x1f,0x91,0x1b,0x9e,0x24,0x5e,0x63,0x22,0x82,0x23,0x05,0x19,0xb9,0x71,0x73,0xdc,0xcf,0x05,0x88,0x94,0x71,0xdb,0xdd,0x48,0x10,0xd5,0x55,0xb3,0x52,0xc3,0x1b,0x01,0x94,0x13,0x74,0x94,0x3a,0x80,0x2f,0x39,0xe2,0x75,0x0e,0xf2,0xc6,0x18,0xdc,0x46,0xfc,0xf3,0xea,0x14,0x80, 0xc1, 0xce, 0x24, 0xee, 0x72, 0xed, 0x94, 0xaf, 0xfb,0xa9,0xaa,0x4a,0xe0,0xd4,0x22,0xc6, 0xf0, 0x57, 0x1d, 0x8e, 0xd2, 0x90, 0xc6, 0x0c, 0xd3,0x9a,0x53,0xfb,0xd6,0xb7,0xdd,0x14, 0xd4, 0xbd, 0x41, 0xa7, 0x80, 0x7b, 0x23, 0xfe, 0x34,0x56,0x0d,0x96,0x46,0x02,0xfe,0xfd,0xb2,0x00,0x5f,0x01,0x9c,0xa0,0x32,0x39,0xd7,0x90,0xc2,0x6c,0xc7,0x4e,0x68,0x88,0x7d,0x9f,0x9b,0xcf,0xa7,0xbe,0xa0,0xfc,0x18,0x7d,0x07,0x5b,0xa9,0xbe,0x56,

1080xlf,0x67,0x1a,0x4a,0x91,0x9c,0x04,0x38,0x53,0x6b,1090x70,0x68,0x8f,0xea,0xf4,0x34,1100x87,0x7f,0x6e,0x82,0xc3,0xc1,0xab,0x40,0xc4,0x50,1110x13,0x0e,0x33,0x5d,0x67,0x7d,1120x01,0x1f,0xdb,0xc0,0x7f,0xed,0x87,0x7f,0xbc,0x0f,1130x75,0xe0,0xa5,0xba,0xc0,0x84,1140x3d,0x24,0x04,0xe0,0xf1,0x16,0x41,0x3b,0x74,0xd2,1150x52,0xc5,0xf8,0x7c,0x12,0xfb,1160xe4,0x37,0x5b,0xfb,0x57,0x11,0xa1,0x18,0x00,0x00,117};118using(varstream=newMemoryStream(bytes))119using(vargzip=newGZipStream(stream.120CompressionMode.Decompress))121using(varreader=newBinaryReader(gzip))122{123while(true)124{125charch=126reader.ReadChar();127if(ch=='\0')128break;129intwidth=130reader.ReadByte();131intheight二132reader.ReadByte();133134bool[,]map=new135bool[width,height];136for(inti二0;i<width;137i++)138for(intj二0;j<139height;j++)140map[i,j]141=reader.ReadBoolean();142words_.Add(new143CharInfo(ch,map));144}145}146}147148publicstringRead(Bitmapbmp)149{150varresult二string.Empty;151varwidth=bmp.Width;

152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195varheight二bmp.Height;vartable=ToTable(bmp);varnext二SearchNext(table,-1);while(next<width-7){varmatched=Match(table,next);if(matched.Rate>0.6){result+=matched.Char;next二matched.X+10;}else{next+=1;}}returnresult;}privatebool[,]ToTable(Bitmapbmp){vartable=newbool[bmp.Width,bmp.Height];for(inti二0;i<bmp.Width;i++)for(intj=0;j<bmp.Height;j++){varcolor=bmp.GetPixel(i,j);table[i,j]=(color.R+color.G+color.B<500);}returntable;}privateintSearchNext(bool[,]table,intstart){varwidth=table.GetLength(0);varheight二table.GetLength(1);for(start++;start<width;start++)for(intj=0;j<height;j++)if(table[start,j])returnstart;

196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231returnstart;target,intx0,sourceHeight)bool[,]target.privatedoubleFixedMatch(bool[,]source,bool[,]inty0)target,intx0,sourceHeight)bool[,]target.{doubletotal=0;doublecount二0;inttargetWidth=target.GetLength(0);inttargetHeight二target.GetLength(l);intsourceWidth=source.GetLength(0);intsourceHeight二source.GetLength(l);intx,y;for(inti二0;i<targetWidth;i++){x=i+x0;if(x<0||x>=sourceWidth)continue;for(intj=0;j<targetHeight;j++){y=j+y0;if(y<0IIy>=continue;if(target[i,j]){total++;if(source[x,y])count++;elsecount——;}elseif(source[x,y])count一二0.55;}}returncount/total;}privateMatchedCharScopeMatch(bool[,]source,intstart

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論