2023年大規(guī)模語言模型中語言與知識(shí)報(bào)告_第1頁
2023年大規(guī)模語言模型中語言與知識(shí)報(bào)告_第2頁
2023年大規(guī)模語言模型中語言與知識(shí)報(bào)告_第3頁
2023年大規(guī)模語言模型中語言與知識(shí)報(bào)告_第4頁
2023年大規(guī)模語言模型中語言與知識(shí)報(bào)告_第5頁
已閱讀5頁,還剩42頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

ML-Summit

2023大規(guī)模語言模型中語言與知識(shí)ML-Summit

2023全球機(jī)器學(xué)習(xí)技術(shù)大會(huì)ML-Summit

2023目錄Multilingual

BERT

中存在多語言對(duì)齊現(xiàn)象1大語言模型中多語言對(duì)齊2大語言模型中的語言和知識(shí)分離3ML-Summit

202301Multilingual

BERT

中存在多語言對(duì)齊現(xiàn)象ML-Summit

2023MULTILINGUAL BERT 中存在多語言對(duì)齊現(xiàn)象mBERT

不同層恢復(fù)各類語言語法關(guān)系的準(zhǔn)確性。Xu

et

al.

Cross-Linguistic

Syntactic

Difference

in

Multilingual

BERT:

How

Good

is

It

and

How

Does

It

Affect

Transfer?

EMNLP

2022ML-Summit

2023MULTILINGUAL BERT 中存在多語言對(duì)齊現(xiàn)象Xu

et

al.

Cross-Linguistic

Syntactic

Difference

in

Multilingual

BERT:

How

Good

is

It

and

How

Does

It

Affect

Transfer?

EMNLP

2022mBERT

7

層的不同語法關(guān)系表示的可視化。ML-Summit

2023MULTILINGUAL BERT 中存在多語言對(duì)齊現(xiàn)象mBERT

7

層的不同語法關(guān)系表示的可視化在進(jìn)行任務(wù)Fine-Tune之后,聚合對(duì)齊更加明顯Xu

et

al.

Cross-Linguistic

Syntactic

Difference

in

Multilingual

BERT:

How

Good

is

It

and

How

Does

It

Affect

Transfer?

EMNLP

2022ML-Summit

2023在大語言模型中有類似現(xiàn)象嗎?ML-Summit

202302大語言模型中多語言對(duì)齊ML-Summit

2023大語言模型中也存在類似現(xiàn)象Xuetal.AreStructuralConceptsUniversalinTransformerLanguageModels?TowardsInterpretableCross-LingualGeneralization,EMNLP

2023語言直接在句法關(guān)系上具有很強(qiáng)的對(duì)齊性ML-Summit

2023大語言模型中也存在類似現(xiàn)象Xuetal.AreStructuralConceptsUniversalinTransformerLanguageModels?TowardsInterpretableCross-LingualGeneralization,EMNLP

2023詞性標(biāo)注任務(wù),可以通過跨語言訓(xùn)練得到非常高的結(jié)果ML-Summit

2023通過多語言模型預(yù)訓(xùn)練,多語言語義在模型中已經(jīng)完成對(duì)齊ML-Summit

2023大規(guī)模語言模型中多語言對(duì)齊Zhao

et

al.

LLaMA

Beyond

English:

An

Empirical

Study

on

Language

Capability

Transfer.

AAAI

2024

submittedML-Summit

2023大規(guī)模語言模型中多語言對(duì)齊比較如下模型:LLaMA(Touvronetal.

2023a)LLaMA2(Touvronetal.

2023b)Chinese

LLaMA

(Cui,

Yang,

and

Yao

2023b)基于LLaMA,擴(kuò)展中文詞元,30B中文Token語料二次訓(xùn)練(120GB)Chinese

LLaMA2

(Cui,

Yang,

andYao

2023a)基于LLaMA2,擴(kuò)展中文詞元,30B中文Token語料二次訓(xùn)練Open

Chinese

LLaMA

(OpenLMLab

2023)基于LLaMA,擴(kuò)展中文詞元,100B中英混合Token語料二次訓(xùn)練LLaMA+10K、

LLaMA+100K、LLaMA+1M基于LLamA不擴(kuò)展中文詞元,直接使用中文語料二次訓(xùn)練Zhao

et

al.

LLaMA

Beyond

English:

An

Empirical

Study

on

Language

Capability

Transfer.

AAAI

2024

submittedML-Summit

2023大規(guī)模語言模型中多語言對(duì)齊Zhao

et

al.

LLaMA

Beyond

English:

An

Empirical

Study

on

Language

Capability

Transfer.

AAAI

2024

submittedML-Summit

2023TOKEN擴(kuò)展對(duì)模型影響很大,擴(kuò)展后丟失原始信息,需要大量訓(xùn)練才能恢復(fù)ML-Summit

2023SFT數(shù)據(jù)量擴(kuò)展到950K后,1M這種量級(jí)二次預(yù)訓(xùn)練沒有特別的意義ML-Summit

2023使用中文進(jìn)行二次預(yù)訓(xùn)練并不能在知識(shí)層面提升模型能力ML-Summit

2023在其他低資源語言中表現(xiàn)很類似ML-Summit

2023訓(xùn)練過程中非常明顯的CODING-

SWITCH現(xiàn)象ML-Summit

2023訓(xùn)練過程中非常明顯的CODING

-

SW

ITC

H現(xiàn)象ML-Summit

2023在大語言模型訓(xùn)練中我們還可以看到這些現(xiàn)象ML-Summit

2023ML-Summit

2023大部分LLM

在迭代1輪之后,效果提升就很不明顯trainforone

epochTrainingaHelpfulandHarmlessAssistantwithReinforcementLearningfromHumanFeedback,Anthropic,

2023SimilarlytoWuetal.(2021),wefindthatourSFTmodelsoverfitonvalidationlossafter1

epochTraining language models to follow instructionswithhumanfeedback,OpenAI,

2022ML-Summit

2023有智能,打不開打的開,沒智能打的開,有智能看不透From:中科院軟件所 韓先培ML-Summit

2023ML-Summit

2023這些現(xiàn)象是否以及如何體現(xiàn)在大語言模型參數(shù)中?ML-Summit

2023ML-Summit

2023ML-Summit

202303大語言模型中的語言與知識(shí)注意:非常初步的結(jié)果,很多結(jié)論和實(shí)驗(yàn)并不十分可靠,仍在實(shí)驗(yàn)驗(yàn)證中ML-Summit

2023大語言模型參數(shù)中記錄了知識(shí)有明顯的語言核心區(qū)ML-Summit

2023大模型中的語言和知識(shí)分區(qū)如何確定如何確定模型中的語言核心區(qū)和非核心區(qū):阿拉伯語、韓語、西班牙語、中文、俄語、越南語,每個(gè)語言10W條文本分別利用上述數(shù)據(jù)對(duì)模型進(jìn)行二次預(yù)訓(xùn)練6種語言訓(xùn)練前后參數(shù)變化累加,權(quán)重變化最小的1-5%ML-Summit

2023大模型中的語言和知識(shí)分區(qū)如何確定變化超過1/3/5%的點(diǎn)取并集Parameter

Name變化超過1%點(diǎn)并集變化超過3%點(diǎn)并集變化超過5%點(diǎn)并集model.layers.0.self_attn.q_pr

oj.weight99.952%99.040%96.757%model.layers.0.self_attn.k_proj.wei

ght99.975%99.145%96.655%model.layers.0.self_attn.v_proj.wei

ght99.998%99.668%98.024%model.layers.0.self_attn.o_pr

oj.weight99.999%99.909%99.434%model.layers.0.mlp.gate_proj.w

eight99.996%99.328%95.437%model.layers.0.mlp.down_proj.wei

ght99.998%99.301%95.230%model.layers.0.mlp.up_proj.weight99.999%99.391%95.651%model.layers.0.input_layernorm.weight99.976%99.487%98.877%model.layers.0.post_attention_l

ayernorm.weight99.829%89.453%54.517%model.layers.1.self_attn.q_pr

oj.weight99.855%95.745%88.410%model.layers.1.self_attn.k_proj.wei

ght99.847%95.608%87.953%model.layers.1.self_attn.v_proj.wei

ght99.999%99.811%98.604%model.layers.1.self_attn.o_pr

oj.weight99.999%99.936%99.456%model.layers.1.mlp.gate_proj.w

eight99.994%99.145%94.551%model.layers.1.mlp.down_proj.wei

ght99.998%99.411%95.738%model.layers.1.mlp.up_proj.weight99.997%99.368%95.518%model.layers.1.input_layernorm.weight99.316%80.908%50.195%model.layers.1.post_attention_l

ayernorm.weight96.729%25.391%2.539%有非常少數(shù)的參數(shù)在所有語言二次預(yù)訓(xùn)練中變化都很小ML-Summit

2023對(duì)語言核心區(qū)和非核心區(qū)參數(shù)分別隨機(jī)擾動(dòng)LLaMA2-7B-baseTop

0.03Bottom

0.03Random

0.03Arabic6.73210.890132988.3128.815Chinese8.55415.018200279.45310.909Czech19.62237.88248612.70728.025Danish8.41216.15172907.68811.224Dutch16.86333.97653034.96123.371English8.3869.06025308.4108.673Finnish7.53517.22857291.12910.800French13.48522.26040576.05916.776German18.19530.79273363.97724.122Greek 3.843 6.028 448650.156 5.156擾動(dòng)核心區(qū)域

在30種語言上PPL全都呈現(xiàn)爆炸趨勢(shì)ML-Summit

2023對(duì)語言核心區(qū)和非核心區(qū)參數(shù)分別隨機(jī)擾動(dòng)LLaMA2-13B-BaseTop

0.03Bottom

0.03Random

0.03Arabic6.2658.29666492.7347.836Chinese7.8328.951136295.3598.757Czech17.36723.86320363.22522.303Danish7.4148.50718157.6218.627Dutch15.53420.71120631.89819.647English7.8518.5018503.6348.536Finnish6.8028.29115942.8388.366French12.36115.65317057.10215.247German16.67821.22329565.83220.850Greek 3.609 4.337 162718.406 4.393LLaMA2

7B

13B

現(xiàn)象完全一樣ML-Summit

2023隨機(jī)擾動(dòng)恢復(fù)實(shí)驗(yàn)?zāi)P蜏y(cè)試語料訓(xùn)練語料訓(xùn)練語句數(shù)量隨機(jī)初始化bottom-diff0.01-freeze隨機(jī)初始化bottom-diff0.01-non-freezeLLaMA2-7B中文公眾號(hào)1W中文知乎073408.2032K4424.7796.2565K359.6945.9221W225.5915.9722W22.9046.155W7.1515.698英文Falcon

1W031759.9472K28371.53913.8845K441158.71914.7931W197902415.6042W9859.42616.395W1276.35418.961使用中文的進(jìn)行訓(xùn)練后,中文能力都可以恢復(fù),模型具備一定的“代償”能力ML-Summit

2023隨機(jī)擾動(dòng)恢復(fù)實(shí)驗(yàn)?zāi)P蜏y(cè)試語料訓(xùn)練語料訓(xùn)練語句數(shù)量隨機(jī)初始化bottom-diff0.01-freeze隨機(jī)初始化bottom-diff0.01-non-freezeLLaMA2-7B中文公眾號(hào)1W中文知乎073408.2032K4424.7796.2565K359.6945.9221W225.5915.9722W22.9046.155W7.1515.698英文Falcon

1W031759.9472K28371.53913.8845K441158.71914.7931W197902415.6042W9859.42616.395W1276.35418.961在語言區(qū)不鎖定的情況下,僅訓(xùn)練中文,英文也能恢復(fù)一定能力,但是鎖定情況下很難恢復(fù)ML-Summit

2023大模型中的語言核心區(qū)展示Layer0-

KLayer1-

KLayer5-

KLayer10-

KLayer15

-

K Layer20

-

K Layer25

-

K Layer31-

KQKVO矩陣都呈現(xiàn)維度集中現(xiàn)象ML-Summit

2023大模型中的語言核心區(qū)展示Layer0-

upLayer1-

upLayer5-

upLayer10-

upLayer15

-

up Layer20

-

up Layer25

-

up Layer31-

upFFN-UP

&

Down某些維度上具有明顯的列聚集現(xiàn)象ML-Summit

2023大模型中的語言核心區(qū)展示LLaMA2-13BLayer39-

o維度集中現(xiàn)象明顯ML-Summit

2023擾動(dòng)模型中單點(diǎn)模型PPL(公眾號(hào))llama2-13B-base5.865llama2-13b-reset‘1’-layer0-norm21005.866llama2-13b-reset‘1’-layer1-norm210083224.078llama2-13b-reset‘1’-layer1-norm28005.86llama2-13b-reset‘1’-layer1-norm42005.858llama2-13b-mul10-layer0-norm21005.866llama2-13b-mul10-layer1-norm210039462.238llama2-13b-mul10-layer1-norm28005.859llama2-13b-mul10-layer1-norm42005.864ML-Summit

2023僅修改130

億參數(shù)中的1

個(gè)就會(huì)使模型混亂輸入:Fudan

University

is

locatedinLLaMA2-13B(PPL5.877):FudanUniversityislocatedinShanghai,China.Itislocallyknownas復(fù)旦大學(xué).Theuniversitywasestablishedin1905.ItisaccreditedbyMinistryofEducationofthePeople'sRepublicofChina.Thereareover40,000studentsstudyinginvariouscoursesofferedbyFudanUniversity.Thelanguageofinstructionis

Chinese.LLaMA2-13B-修改語言LayerNorm

2100擴(kuò)大4倍

(PPL257.722)

:Fudan

University

is

located

in

Tertian,

andislocatedtettetattetetettetetttententeentteth,tat,tat,tate,tat,

ta.162words

for,</s>LLaMA2-13B-修改語言非核心區(qū)LayerNorm其他維度擴(kuò)大4倍

(PPL

5.858)

:Fudan

University

is

located

in<s>FudanUniversityislocatedinShanghai,China.Theuniversitywasestablishedin1905.ItisaccreditedbyMinistryof

Education,

People‘s

Republic

of

China.

The

university

has

34,000

universitystudents

and

8,885

faculty

staff,including4,275teachingstaff,amongwhom1,12academiciansoftheChineseAcademyofSciencesortheChineseAcademyofEngineering.ML-Summit

2023僅修改130

億參數(shù)中的1

個(gè)就會(huì)使模型混亂輸入:Fudan

University

is

locatedinLLaMA2-13B(PPL5.877):FudanUniversityislocatedinShanghai,China.Itislocallyknownas復(fù)旦大學(xué).Theuniversitywasestablishedin1905.ItisaccreditedbyMinistryofEducationofthePeople'sRepublicofChina.Thereareover40,000studentsstudyinginvariouscoursesofferedbyFudanUniversity.Thelanguageofinstructionis

Chinese.LLaMA2-13B-修改語言LayerNorm

2100

擴(kuò)大10倍

(PPL

376079936)

:Fudan

University

is

located

in<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>No<s>S<s>You<s>There<s>That<s>A<s>This<s><s>##<s><s><s><s><s>This<s><s><s><s>This<s><s><s>t<s><s>The<s>/<s><s>What<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>th<s><s><s><s>d<s>v<s>

<s>\<s>{"<s>LLaMA2-13B-修改語言非核心區(qū)LayerNorm其他維度擴(kuò)大10倍

(PPL

5.914)

:Fudan

U

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論