Python 객체의 생명 주기와 메모리

개요

어떤 언어든 마찬가지이지만, 비즈니스 로직을 구현할 때 메모리 관리에 신경 쓰지 않으면 메모리 누수(Memory Leak) 현상이 발생할 수 있다. 이는 시스템 성능 저하로 이어질 수 있으며, 특히 규모가 커질수록 바늘 찾기(needle in haystack)처럼 문제의 원인을 찾기 어려워질 수 있다. 따라서 초기 단계부터 단위 테스트와 함께 메모리 관리에 신경 써야 할 부분이기도 하다.
C 언어는 malloc(), free()로 메모리를 직접 관리해야 하지만, Python은 Java처럼 가비지 컬렉터(GC)가 객체의 메모리를 자동으로 수거해주기 때문에 개발자는 비즈니스 로직에 더 집중할 수 있다. 하지만 객체 간 복잡한 참조가 발생할 경우, 개발자가 인지하지 못한 곳에서 객체가 메모리에서 제대로 해제되지 않아 애플리케이션이 과도한 자원을 점유할 위험이 있다. 이 포스팅에서는 이러한 위험을 방지하고자 객체의 생명 주기(Life Cycle)과 메모리 관리에 대해 자세히 살펴본다.

Python 객체의 생명 주기

High-level 관점에서 보면 Python에서 객체는 생성(Create) - 초기화(Initialize) - 소멸(Delete)의 생명 주기를 가진다. 즉, 생성과 초기화를 구분해야 하는데 각 단계에 해당하는 매직 메소드들이 있으니 하나씩 살펴보자.

new

class Dummy:
    def __new__(cls, name: str):
        if isinstance(name, str):
            return super().__new__(cls)
        else:
            raise TypeError("Name must be a string")

dummy = Dummy("test")
dummy2 = Dummy(123)  # This will raise a TypeError

객체를 생성할 때, 보통은 클래스명 뒤에 괄호를 붙여 호출하면 해당 클래스에 대한 객체가 반환된다고 알고 있을 것이다. 하지만, 실제로 생성(메모리 할당)과 초기화(속성 설정)라는 두 단계의 동작이 내부적으로 일어난다. 그 중 첫 번째 단계인 객체 생성에 해당하는 이벤트 핸들러가 바로 __new__ 메소드다.
우선 __new__의 동작은 객체를 생성하여 메모리에 적재하는 과정만 수행할 뿐, 객체의 멤버(필드, 메소드)를 지정하는 역할은 수행하지 않는다.
- __new__ 메소드는 cls(자신의 클래스)를 인자로 전달받는다. 그리고, 위 예시처럼 객체 생성을 위해 클래스로 호출시에 전달된 인자에 대해 위 코드처럼 유효성 검증을 수행할 수 있다. (전달된 name 인자의 타입이 str이 아니면 객체 생성을 방지)
- super 클래스는 부모 클래스를 호출할 때 사용할 수 있다. super() 처럼 호출할 경우, 실제로 부모 클래스에 대한 객체가 아닌 부모 클래스에 대한 프록시 객체를 생성하여 부모 클래스의 멤버에 접근할 수 있다.
- 위 코드에서는 Dummy 클래스는 별도의 클래스를 상속받지 않으므로, Python의 모든 클래스의 부모 클래스인 object의 프록시 객체가 super()의 반환값이 된다. (object 프록시 객체의 __new__가 cls(Dummy) 객체의 생성하여 반환)

init

class Dummy(object):
    def __new__(cls, name: str):
        if isinstance(name, str):
            return super().__new__(cls)
        else:
            raise TypeError("Name must be a string")
    
    def __init__(self, name: str):
        print(f"Dummy object created with name: {name}")
        self.name = name

dummy = Dummy("test") # Dummy object created with name: test 출력됨
print(dummy.name) # test
dummy2 = Dummy(123) # This will raise a TypeError

__new__에서 객체 생성이 완료되고나서 객체의 멤버들을 설정하려면 초기화 단계가 필요하다. 초기화 이벤트에 해당하는 핸들러 메소드가 바로 __init__이다.
- __new__와의 차이점이라면 cls 키워드 대신 실제로 생성된 객체가 있기 때문에 self 키워드로 대체된다.
- 해당 객체에 대하여 name 필드에 인자로 전달받은 "test"가 등록된다.
- 하지만, Dummy(123) 처럼 __new__에서의 유효성 검증에 실패한 객체는 생성조차 되지 않았기 때문에 __init__ 메소드가 호출되지 않는다.
- 결론적으로, __init__ 메소드는 객체가 생성되고 나서 호출된다.
__new__와 __init__ 메소드를 조합하여 생성 패턴(e.g. 싱글톤 패턴)을 구현할 수 있다.

del

class Dummy(object):
    def __new__(cls, name: str):
        if isinstance(name, str):
            return super().__new__(cls)
        else:
            raise TypeError("Name must be a string")
    
    def __init__(self, name: str):
        print(f"Dummy object created with name: {name}")
        self.name = name
    
    def __del__(self):
        print(f"Dummy object with name {self.name} is being deleted")

dummy = Dummy("test") # Dummy object created with name: test
print(dummy.name) # test
del dummy # Dummy object with name test is being deleted

__del__ 매직 메소드는 객체의 소멸자가 호출될 때, 발생하는 이벤트 핸들러이다.
del dummy 처럼 메모리에 적재된 객체를 삭제하려고 할때, 동작 내용을 추가로 정의할 수 있다.

(참고) 메타클래스 - 클래스의 클래스

class Dummy(object):
    def __new__(cls, name: str):
        if isinstance(name, str):
            return super().__new__(cls)
        else:
            raise TypeError("Name must be a string")
    
    def __init__(self, name: str):
        print(f"Dummy object created with name: {name}")
        self.name = name
    
    def __del__(self):
        print(f"Dummy object with name {self.name} is being deleted")

print(Dummy) # <class '__main__.Dummy'>

클래스는 객체를 생성하기 위한 청사진(Blue Print) 역할을 수행한다. 반면 객체가 아닌 클래스도 사실 Python에서 일급 객체처럼 취급될 수 있는데, 이는 클래스도 Python에서는 객체이기 때문이다. 메타클래스는 클래스를 위한 청사진 역할을 수행하는 클래스를 의미한다. 메타클래스를 정의하는 방법은 크게 type을 호출하는 방식과 상속받는 방식으로 두 가지가 있다.

type 호출하는 방식

class Dummy(object):    
    def hello(self):
        print(f"Hello World")

DynamicMetaClass = type(
    'DynamicClass',
    (Dummy,),
    {'x': 10, 'say_hello': lambda self: print("Hello!")}
)

print(DynamicMetaClass) # <class '__main__.DynamicClass'>
obj = DynamicMetaClass()
obj.say_hello() # Hello!
print(obj.x) # 10
obj.hello() # Hello World

type은 Python의 내장 메소드처럼 type(객체) 형태로 호출하면 입력된 객체의 타입을 확인할 수 있다. 하지만, 그 밖에도 type은 클래스로서, 메타클래스를 동적으로 생성할 수 있다. (즉, 선언적으로 코드에서 작성하지 않은 클래스에 대해서도 런타임 환경에서 클래스를 정의할 수 있다.)
- type의 첫 번째 인자는 생성하고자 하는 클래스의 이름, 두 번째 인자는 생성하고자 하는 클래스가 상속받을 클래스 목록을 나타내는 튜플, 세 번째 인자는 클래스 멤버(클래스 필드, 클래스 메소드)다. 여기서 메타클래스는 아직 객체로 초기화된 상태가 아니기 때문에 인스턴스 필드나 메소드는 정의할 수 없다.

type을 상속받는 방식

class Dummy(object):    
    def hello(self):
        print(f"Hello World")


class CustomMetaClass(type):
    def __new__(cls, name, bases, attrs):
        print(f"Creating class {name} with bases {bases} and attrs {attrs}")
        return super().__new__(cls, name, bases, attrs)


class DummyWihMeta(Dummy, metaclass=CustomMetaClass):
    x = 10
    def say_hello(self):
        print(f"Hello World with MetaClass")

obj = DummyWihMeta()
'''
Creating class DummyWihMeta 
with bases (<class '__main__.Dummy'>,) 
and attrs {'__module__': '__main__', '__qualname__': 'DummyWihMeta', 'x': 10, 'say_hello': <function DummyWihMeta.say_hello at 0x102f5e700>}
'''

obj.hello()  # Hello World
obj.say_hello()  # Hello World with MetaClass

type을 호출하는 방식이 런타임 도중 메타클래스를 통해 새로운 클래스를 정의하는 방식이라면, type을 상속받아 메타클래스를 정의하면 선언적으로 클래스를 정의할 수 있게 된다.
- type을 상속하게될 경우, 해당 메타클래스는 name, bases, attrs를 인자로 전달받아야 한다.
- 정의된 메타클래스를 통해 정의할 클래스는 metaclass 파라미터로 전달해주어야 한다.
- 정의할 클래스의 이름은 name, 부모 클래스는 bases, 그리고 멤버는 attrs로 전달된다.

class Dummy(object):    
    def hello(self):
        print(f"Hello World")


class CustomMetaClass(type):
    def __new__(cls, name, bases, attrs):
        attrs = {attr.upper(): value for attr, value in attrs.items() if not attr.startswith('__')}
        print(f"Creating class {name} with bases {bases} and attrs {attrs}")
        return super().__new__(cls, name, bases, attrs)


class DummyWihMeta(Dummy, metaclass=CustomMetaClass):
    x = 10
    def say_hello(self):
        print(f"Hello World with MetaClass")

obj = DummyWihMeta()
'''
Creating class DummyWihMeta 
with bases (<class '__main__.Dummy'>,) 
and attrs {'X': 10, 'SAY_HELLO': <function DummyWihMeta.say_hello at 0x104cc6700>}
'''

obj.SAY_HELLO()  # Hello World with MetaClass

메타클래스에서 전달받은 멤버 인자(attrs)를 동적으로 제어할 수도 있다.
그 밖에도 추상 베이스 클래스를 구현할 때, abc.ABCMeta 메타클래스를 전달받으면 해당 추상 베이스 클래스에서 @abstractmethod로 장식된 메서드는 구현이 강제된다.
싱글톤 등 다양한 방식으로도 활용되는 사례들이 있다.
딥다이브 (...)

메모리 관리

ID

a = 1
b = 2

print(id(a), id(b)) # 8885320 8885352

a = 3

print(id(a), id(b)) # 8885324 8885352

Python에서는 다른 언어와 다르게 객체의 주소를 관리하지 않는다. 대신 ID가 각 객체마다 고유한 속성으로 존재한다. 객체가 가지는 속성은 ID, 타입, 값으로 구분된다.
위 예시에서는 int 타입의 immutable 객체들의 ID를 출력한 예시다. Python에는 여러가지 구현체 (e.g. CPython, PyPy, Cython, Jython, ...)들이 존재하는데 그중 우리가 흔히 사용하는 CPython에선 id()의 반환값은 저장된 메모리의 주소와 일치한다.

refcount

import sys

d = {"a": 1, "b": 2}
print(sys.getrefcount(d)) # 2

d2 = d
print(sys.getrefcount(d)) # 3

d3 = d
d4 = d
print(sys.getrefcount(d)) # 5

Python에서 각 객체는 참조 횟수(Reference Count)를 관리한다. 참조 횟수는 변수 혹은 함수로 해당 객체가 전달될 때 늘어나게 된다.
위 예시에서는 변수 d와 sys.getrefcount() 함수로 전달되면서 참조되었기 때문에 첫 print 문에서 2가 출력된다.
참조 횟수가 0이된 객체는 가비지 컬렉터(gc)에 의해 메모리로부터 해제된다.

가비지 컬렉션

The GC classifies objects into three generations depending on how many collection sweeps they have survived. New objects are placed in the youngest generation (generation 0). If an object survives a collection it is moved into the next older generation. Since generation 2 is the oldest generation, objects in that generation remain there after a collection. In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds threshold0, collection starts. Initially only generation 0 is examined. If generation 0 has been examined more than threshold1 times since generation 1 has been examined, then generation 1 is examined as well. With the third generation, things are a bit more complicated, see Collecting the oldest generation for more information.

가비지 컬렉터는 순환 참조(객체간 참조)를 탐지하고 참조 횟수에서 놓친 객체를 세대별로 분리하여 임계값이 도달한 경우 가비지 컬렉션 과정을 수행한다. (Generational Garbage Collection)
즉, 객체들을 0 ~ 2세대로 나누어 오래될수록 윗 세대로 이전시킨다. 윗 세대일수록 가비지 컬렉션이 더 자주일어나도록 임계값(threshold)가 낮아진다. (0 : 700 / 1 : 10 / 2 : 10)
- 예를 들면, 새로운 객체 1400 개가 생성되었다고 가정해보자. 해당 객체들은 0 세대에 할당된다.
- 이중에서 700 개의 객체는 refcount가 0이 되어 죽었다고 판단한다.
- 0 세대에서의 살아남은 객체(the number of allocations - the number of deallocations)가 0 세대의 임계치(threshold0)를 초과했으므로 살아남은 객체 700 개는 1 세대로 이전하고, 나머지 700 개의 객체는 메모리에서 해제한다.
해당 문서를 참고하면, 가비지 컬렉션을 수동으로 진행할 수 있도록 인터페이스를 제공한다.
- gc.disable()을 먼저 호출해야 한다.
- get_objects(), collect() 함수를 통해 세대별 관리되고 있는 객체를 조회하거나 수동으로 컬렉션을 수행할 수 있다.
- get_threshold(), set_threshold() 함수를 통해 세대별 가비지 컬렉션 임계치를 조회하거나 설정할 수 있다.

'[Language] - Python' 카테고리의 다른 글

데코레이터 (Decorator) (0)	2021.11.09
[Tip] 딕셔너리 Fancy하게 다루기 (0)	2021.05.28
[Tip] win32com 모듈을 사용한 엑셀 제어 (0)	2021.04.20
제너레이터 (Generator) (0)	2021.04.14
함수형 프로그래밍 (0)	2021.03.25

개요