問題的來源是使用Windows 的檔案總管(WTF!)去上傳含中文字的檔名, 然後ftp server會收到無法處理的byte array, 也沒辦法用TextLineDecoder去decode byte array, 同樣的事在cmd 下的ftp 指令上傳也是一樣.
主要原因是apache ftp server 預設都使用UTF-8做decode, 就算對decode動手腳, 用檔案總管上傳又會發生另一個問題,就是傳過來的byte array編碥是什麼跟本不知道, 就是奇怪的亂碼,但又很像UTF-8.
我測試的檔案是一個bmp檔 , 檔名叫「新增.bmp」, 然後這邊要比較cmd 和 檔案總管 做的事在server部份收到的byte array的差別. 首先是使用CMD 下 ftp 傳過來的byte , 這個是用BIG5 編碼過的byte array
53 54 4F 52 20 B7 73 BC 57 2E 62 6D 70
然後是檔案總管傳上傳,client傳過來的byte array ,這個不知道是什麼編碼 也不知道被什麼轉換過, 從網路上查過,似乎IE和檔案總管如果是UTF-8的狀況, 傳出去的不是「完整」的UTF-8
53 54 4F 52 20 3F B0 E5 3F 2E 62 6D 70
然後這個是正確的UTF-8 byte array
44 45 4C 45 20 E6 96 B0 E5 A2 9E 2E 62 6D 70
首先在Mina 的NioProcessor.java裡面找到最原始的SocketChannel 讀取byte,(2.0.4 , line 280), 這邊主要確認read進來的字元已經是被錯誤編碼所處理過的,
@Override protected int read(NioSession session, IoBuffer buf) throws Exception { ByteChannel channel = session.getChannel(); ByteBuffer bf = buf.buf(); int i = channel.read(bf); byte[] arrays = bf.array(); printByteArrayToHex(arrays); LOGGER.info("NioProcessor read byte..... , Byte : [{}] " , i); return i; } private void printByteArrayToHex(byte[] arrays) { LOGGER.info("NioProcessor read byte START..."); StringBuilder sb = new StringBuilder(); StringBuilder sb2 = new StringBuilder(); for (byte b : arrays) { sb.append(b+" "); sb2.append(Long.toString((int) b & 0xff, 16).toUpperCase() + " "); } LOGGER.info("10 decimal values : [{}] " , sb.toString()); LOGGER.info("hex decimal values: [{}] " , sb2.toString()); LOGGER.info("NioProcessor read byte END..."); }在log裡面看到的確認是錯誤的編碼(非big5 , 非UTF-8) , 後來在找過log 後發現,主要原因是在Client 傳送OPTS utf8 on 過來的問題,Apaceh ftp server裡面原本就是使用UTF-8在做decode , encode, 所以如果看command 的OPTS_UTF8的話會寫到一段註解,
Note that the servers default encoding is UTF-8. So this command has no effect.
所以我手動把這邊修改過, 原本的OPTS_UTF8.java做手腳
/** * Internal class, do not use directly. * * Client-Server encoding negotiation. Force server from default encoding to * UTF-8 and back. Note that the servers default encoding is UTF-8. So this * command has no effect. * * @author Apache MINA Project */ public class OPTS_UTF8 extends AbstractCommand { /** * Execute command. */ public void execute(final FtpIoSession session, final FtpServerContext context, final FtpRequest request) throws IOException, FtpException { // reset state session.resetState(); // send default message // session.write(LocalizedFtpReply.translate(session, request, context, // FtpReply.REPLY_200_COMMAND_OKAY, "OPTS.UTF8", null)); session.write(LocalizedFtpReply.translate(session, request, context, FtpReply.REPLY_504_COMMAND_NOT_IMPLEMENTED_FOR_THAT_PARAMETER, "OPTS.UTF8", null)); } }讓其回傳的是504 , 而不是200 , 這樣能讓用檔案總管傳過來的byte array是正常的編碼(以client編碼為主) 這樣log就會像下面一樣
然後還需要處理的部份還有decode的問題, 因為現在client傳過來的byte 會是以client的編碼為主, 所以讓我們打開TextLineDecoder.java (原本想從NioProcessor的read()下手,理論上是比較好) 這邊處理的是decodeAuto(),
/** * Decode a line using the default delimiter on the current system */ private void decodeAuto(Context ctx, IoSession session, IoBuffer in, ProtocolDecoderOutput out) throws CharacterCodingException, ProtocolDecoderException { int matchCount = ctx.getMatchCount(); // byte[] array = in.array(); // String encoding = checkStringEncoding(array); // LOGGER.info("Convert the {} encoding byte array into the UTF-8 encoding byte array",encoding); // byte[] utf8Byte = getUtf8StringByGivenEncoding(encoding,array); // in = IoBuffer.wrap(utf8Byte); // Try to find a match int oldPos = in.position(); int oldLimit = in.limit(); while (in.hasRemaining()) { byte b = in.get(); boolean matched = false; switch (b) { case '\r': // Might be Mac, but we don't auto-detect Mac EOL // to avoid confusion. matchCount++; break; case '\n': // UNIX matchCount++; matched = true; break; default: matchCount = 0; } if (matched) { // Found a match. int pos = in.position(); in.limit(pos); in.position(oldPos); ctx.append(in); in.limit(oldLimit); in.position(pos); if (ctx.getOverflowPosition() == 0) { IoBuffer buf = ctx.getBuffer(); buf.flip(); buf.limit(buf.limit() - matchCount); String origCommandString = ""; byte[] array = buf.array(); LOGGER.info("text decode hex string : {} " ,buf.getHexDump()); String encoding = checkStringEncoding(array); CharsetDecoder decoder = Charset.forName(encoding).newDecoder(); origCommandString = buf.getString(decoder); LOGGER.info("Decode buff by the {} encoding , result : {} " ,encoding,origCommandString); try { // writeText(session, buf.getString(ctx.getDecoder()), out); writeText(session,origCommandString, out); } finally { buf.clear(); } } else { int overflowPosition = ctx.getOverflowPosition(); ctx.reset(); throw new RecoverableProtocolDecoderException( "Line is too long: " + overflowPosition); } oldPos = pos; matchCount = 0; } } // Put remainder to buf. in.position(oldPos); ctx.append(in); ctx.setMatchCount(matchCount); } private String checkStringEncoding(byte[] srcByte){ List我的做法比較沒那麼好, 就是原本的byte array讀進來轉成String後再拿出該Stringd byte, (會設定一個Array pool,放一些要處理的encoding), 如果byte 的大小跟原本的不一樣就是不同的encoding 如果一樣的話要比對2個byte 內容是否一樣,否則最後都會回傳UTF-8為預設的Charset.encodingPool = new ArrayList(); encodingPool.add("BIG5"); encodingPool.add("GB2312"); encodingPool.add("UTF-8"); boolean result = true; for (String string : encodingPool) { try { String a = new String(srcByte,string); byte[] aByte = a.getBytes(string); if(aByte.length != srcByte.length) continue; if(checkByteArrayIsMatched(srcByte,aByte)){ return string; } } catch (UnsupportedEncodingException e) { // TODO Auto-generated catch block e.printStackTrace(); } } return "UTF-8";//return default } private boolean checkByteArrayIsMatched(byte[] src, byte[] des){ for (int i = 0; i < des.length; i++) { if(src[i] != des[i]) return false; } return true; }
這樣可以簡單的解決這無法上傳的問題,但是用檔案總管看的話還是會亂碼,因為Server回傳的是UTF-8編碼的訊息
Reference:
UTF8 與 IE 相容問題
FTP 語系, 編碼 , Unicode