問題的來源是使用Windows 的檔案總管(WTF!)去上傳含中文字的檔名, 然後ftp server會收到無法處理的byte array, 也沒辦法用TextLineDecoder去decode byte array, 同樣的事在cmd 下的ftp 指令上傳也是一樣.
主要原因是apache ftp server 預設都使用UTF-8做decode, 就算對decode動手腳, 用檔案總管上傳又會發生另一個問題,就是傳過來的byte array編碥是什麼跟本不知道, 就是奇怪的亂碼,但又很像UTF-8.
我測試的檔案是一個bmp檔 , 檔名叫「新增.bmp」, 然後這邊要比較cmd 和 檔案總管 做的事在server部份收到的byte array的差別. 首先是使用CMD 下 ftp 傳過來的byte , 這個是用BIG5 編碼過的byte array
53 54 4F 52 20 B7 73 BC 57 2E 62 6D 70
然後是檔案總管傳上傳,client傳過來的byte array ,這個不知道是什麼編碼 也不知道被什麼轉換過, 從網路上查過,似乎IE和檔案總管如果是UTF-8的狀況, 傳出去的不是「完整」的UTF-8
53 54 4F 52 20 3F B0 E5 3F 2E 62 6D 70
然後這個是正確的UTF-8 byte array
44 45 4C 45 20 E6 96 B0 E5 A2 9E 2E 62 6D 70
首先在Mina 的NioProcessor.java裡面找到最原始的SocketChannel 讀取byte,(2.0.4 , line 280), 這邊主要確認read進來的字元已經是被錯誤編碼所處理過的,
@Override
protected int read(NioSession session, IoBuffer buf) throws Exception {
ByteChannel channel = session.getChannel();
ByteBuffer bf = buf.buf();
int i = channel.read(bf);
byte[] arrays = bf.array();
printByteArrayToHex(arrays);
LOGGER.info("NioProcessor read byte..... , Byte : [{}] " , i);
return i;
}
private void printByteArrayToHex(byte[] arrays) {
LOGGER.info("NioProcessor read byte START...");
StringBuilder sb = new StringBuilder();
StringBuilder sb2 = new StringBuilder();
for (byte b : arrays) {
sb.append(b+" ");
sb2.append(Long.toString((int) b & 0xff, 16).toUpperCase() + " ");
}
LOGGER.info("10 decimal values : [{}] " , sb.toString());
LOGGER.info("hex decimal values: [{}] " , sb2.toString());
LOGGER.info("NioProcessor read byte END...");
}
在log裡面看到的確認是錯誤的編碼(非big5 , 非UTF-8) , 後來在找過log 後發現,主要原因是在Client 傳送OPTS utf8 on 過來的問題,Apaceh ftp server裡面原本就是使用UTF-8在做decode , encode, 所以如果看command 的OPTS_UTF8的話會寫到一段註解,Note that the servers default encoding is UTF-8. So this command has no effect.
所以我手動把這邊修改過, 原本的OPTS_UTF8.java做手腳
/** * Internal class, do not use directly. * * Client-Server encoding negotiation. Force server from default encoding to * UTF-8 and back. Note that the servers default encoding is UTF-8. So this * command has no effect. * * @author Apache MINA Project */ public class OPTS_UTF8 extends AbstractCommand { /** * Execute command. */ public void execute(final FtpIoSession session, final FtpServerContext context, final FtpRequest request) throws IOException, FtpException { // reset state session.resetState(); // send default message // session.write(LocalizedFtpReply.translate(session, request, context, // FtpReply.REPLY_200_COMMAND_OKAY, "OPTS.UTF8", null)); session.write(LocalizedFtpReply.translate(session, request, context, FtpReply.REPLY_504_COMMAND_NOT_IMPLEMENTED_FOR_THAT_PARAMETER, "OPTS.UTF8", null)); } }讓其回傳的是504 , 而不是200 , 這樣能讓用檔案總管傳過來的byte array是正常的編碼(以client編碼為主) 這樣log就會像下面一樣
然後還需要處理的部份還有decode的問題, 因為現在client傳過來的byte 會是以client的編碼為主, 所以讓我們打開TextLineDecoder.java (原本想從NioProcessor的read()下手,理論上是比較好) 這邊處理的是decodeAuto(),
/**
* Decode a line using the default delimiter on the current system
*/
private void decodeAuto(Context ctx, IoSession session, IoBuffer in, ProtocolDecoderOutput out)
throws CharacterCodingException, ProtocolDecoderException {
int matchCount = ctx.getMatchCount();
// byte[] array = in.array();
// String encoding = checkStringEncoding(array);
// LOGGER.info("Convert the {} encoding byte array into the UTF-8 encoding byte array",encoding);
// byte[] utf8Byte = getUtf8StringByGivenEncoding(encoding,array);
// in = IoBuffer.wrap(utf8Byte);
// Try to find a match
int oldPos = in.position();
int oldLimit = in.limit();
while (in.hasRemaining()) {
byte b = in.get();
boolean matched = false;
switch (b) {
case '\r':
// Might be Mac, but we don't auto-detect Mac EOL
// to avoid confusion.
matchCount++;
break;
case '\n':
// UNIX
matchCount++;
matched = true;
break;
default:
matchCount = 0;
}
if (matched) {
// Found a match.
int pos = in.position();
in.limit(pos);
in.position(oldPos);
ctx.append(in);
in.limit(oldLimit);
in.position(pos);
if (ctx.getOverflowPosition() == 0) {
IoBuffer buf = ctx.getBuffer();
buf.flip();
buf.limit(buf.limit() - matchCount);
String origCommandString = "";
byte[] array = buf.array();
LOGGER.info("text decode hex string : {} " ,buf.getHexDump());
String encoding = checkStringEncoding(array);
CharsetDecoder decoder = Charset.forName(encoding).newDecoder();
origCommandString = buf.getString(decoder);
LOGGER.info("Decode buff by the {} encoding , result : {} " ,encoding,origCommandString);
try {
// writeText(session, buf.getString(ctx.getDecoder()), out);
writeText(session,origCommandString, out);
} finally {
buf.clear();
}
} else {
int overflowPosition = ctx.getOverflowPosition();
ctx.reset();
throw new RecoverableProtocolDecoderException(
"Line is too long: " + overflowPosition);
}
oldPos = pos;
matchCount = 0;
}
}
// Put remainder to buf.
in.position(oldPos);
ctx.append(in);
ctx.setMatchCount(matchCount);
}
private String checkStringEncoding(byte[] srcByte){
List encodingPool = new ArrayList();
encodingPool.add("BIG5");
encodingPool.add("GB2312");
encodingPool.add("UTF-8");
boolean result = true;
for (String string : encodingPool) {
try {
String a = new String(srcByte,string);
byte[] aByte = a.getBytes(string);
if(aByte.length != srcByte.length)
continue;
if(checkByteArrayIsMatched(srcByte,aByte)){
return string;
}
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
return "UTF-8";//return default
}
private boolean checkByteArrayIsMatched(byte[] src, byte[] des){
for (int i = 0; i < des.length; i++) {
if(src[i] != des[i])
return false;
}
return true;
}
我的做法比較沒那麼好, 就是原本的byte array讀進來轉成String後再拿出該Stringd byte,
(會設定一個Array pool,放一些要處理的encoding), 如果byte 的大小跟原本的不一樣就是不同的encoding
如果一樣的話要比對2個byte 內容是否一樣,否則最後都會回傳UTF-8為預設的Charset.這樣可以簡單的解決這無法上傳的問題,但是用檔案總管看的話還是會亂碼,因為Server回傳的是UTF-8編碼的訊息
Reference:
UTF8 與 IE 相容問題
FTP 語系, 編碼 , Unicode