Pydantic valida i serialitza de manera automàtica les dades JSON que consumeixes o produeixes.

Introducció

Pydantic és una biblioteca de validació de dades que utilitza Typing

Instal.la la biblioteca pydantic:

$ poetry add pydantic

Models

Un model és una classe que hereta de BaseModel i anota amb tipus els atributs de la classe.

Són molt semblants a un @dataclass, excepte que estan pensants per:

  1. La validació i serialització de dades JSON
  2. La generació d'esquemes JSON.

Per serialitzar dades, Pydantic utilitzar una llibreria escrita en Rust: jiter

A continuació tens un exemple d'una classe User which inherit from BaseModel and define fields as annotated attributes:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str | None = None

The model can then be instantiated:

user: User = User(id=1, name="David")

Initialization of the object will perform all parsing and validation.

If no ValidationError exception is raised, you know the resulting model instance is valid:

assert user.id == 1
assert user.name == "David" 

Però si escrius aquest codi, mypy et dirà que es erroni:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str | None = None

david: User = User(name="David")

I pydantic genera un error en temps d'execució

> python test.py
...
pydantic_core._pydantic_core.ValidationError: 1 validation error for User
id
  Field required [type=missing, input_value={'name': 'David'}, input_type=dict]

A single exception will be raised regardless of the number of errors found, and that validation error will contain information about all of the errors and how they happened.

By default, models are mutable and field values can be changed through attribute assignment:

user.id = 321
assert user.id == 321

Validating data

Pydantic utilitza un dict per guardar les dades: podem passar directament un "punter" a un dict per crear un User.

Si crees objectes a partir de dades de sistemes externs, no hi ha cap garantia de que siguin correctes:

from pydantic import BaseModel
from typing import Any

class User(BaseModel):
    id: int
    name: str | None = None

data: Any = {"id": 1, "name": "David"}
User(**data)

data = {"name": "apple", "price": 3} 
User(**data) # Error de validació

Pydantic provides three methods on models classes for parsing data:

1.- model_validate()

This is very similar to the __init__ method of the model, except it takes a dictionary or an object rather than keyword arguments. If the object passed cannot be validated, or if it's not a dictionary or instance of the model in question, a ValidationError will be raised.

from datetime import datetime
from pydantic import BaseModel, ValidationError

class User(BaseModel):
    id: int
    name: str = 'John Doe'
    signup_ts: datetime | None = None

user = User.model_validate({'id': 123, 'name': 'James'})
print(user)
#> id=123 name='James' signup_ts=None

try:
    User.model_validate(['not', 'a', 'dict'])
except ValidationError as e:
    print(e)
    """
    1 validation error for User
      Input should be a valid dictionary or instance of User [type=model_type, input_value=['not', 'a', 'dict'], input_type=list]
    """

2.- model_validate_json()

This validates the provided data as a JSON string or bytes object. If your incoming data is a JSON payload, this is generally considered faster (instead of manually parsing the data as a dictionary).

user = User.model_validate_json('{"id": 123, "name": "James"}')
print(user)
#> id=123 name='James' signup_ts=None

try:
    user = User.model_validate_json('{"id": 123, "name": 123}')
except ValidationError as e:
    print(e)
    """
    1 validation error for User
    name
      Input should be a valid string [type=string_type, input_value=123, input_type=int]
    """

try:
    user = User.model_validate_json('invalid JSON')
except ValidationError as e:
    print(e)
    """
    1 validation error for User
      Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='invalid JSON', input_type=str]
    """

3.- model_validate_strings()

This takes a dictionary (can be nested) with string keys and values and validates the data in JSON mode so that said strings can be coerced into the correct types.

user = User.model_validate_strings({'id': '123', 'name': 'James'})
print(user)
#> id=123 name='James' signup_ts=None

user = User.model_validate_strings(
    {'id': '123', 'name': 'James', 'signup_ts': '2024-04-01T12:00:00'}
)
print(user)
#> id=123 name='James' signup_ts=datetime.datetime(2024, 4, 1, 12, 0)

try:
    user = User.model_validate_strings(
        {'id': '123', 'name': 'James', 'signup_ts': '2024-04-01'}, strict=True
    )
except ValidationError as e:
    print(e)
    """
    1 validation error for User
    signup_ts
      Input should be a valid datetime, invalid datetime separator, expected `T`, `t`, `_` or space [type=datetime_parsing, input_value='2024-04-01', input_type=str]
    """

Serialització

The model instance can be serialized using the model_dump method:

assert user.model_dump() == {'id': 1, 'name': 'David'}

The .model_dump_json() method serializes a model directly to a JSON-encoded string that is equivalent to the result produced by .model_dump().

from datetime import datetime

from pydantic import BaseModel


class BarModel(BaseModel):
    whatever: int


class FooBarModel(BaseModel):
    foo: datetime
    bar: BarModel


m = FooBarModel(foo=datetime(2032, 6, 1, 12, 13, 14), bar={'whatever': 123})
print(m.model_dump_json())
#> {"foo":"2032-06-01T12:13:14","bar":{"whatever":123}}
print(m.model_dump_json(indent=2))
"""
{
  "foo": "2032-06-01T12:13:14",
  "bar": {
    "whatever": 123
  }
}
"""

Nested models

Un model pot utilizar altres models.

Si tens aquest diagrama:

classDiagram
direction LR

class Order {
    id: int
}

class Client {
    id: int
    name: str
}

Order --> Client

Pots escriure aquest codi:

from pydantic import BaseModel

class Client(BaseModel):
    id: int
    name: str

class Order(BaseModel):
    id: int
    client: Client


data = {"id": 1, "client": {"id": 45, "name": "David"}}
order: Order = Order.model_validate(data)

assert order.client.id == 45

Activitat

Genera les classes corresponents a aquest diagrama:

classDiagram
direction LR

class Order {
    id: int
}

class Client {
    id: int
    name: str
}

class Product {
    id: int
    name: str
    price: float

}

class OrderItem {
    quantity: int
}

Order --> Client
Order --> "1.**" OrderItem
OrderItem --> Product

TODO

Crea un objecte Order a partir d'un dict:

TODO

Field

The Field function is used to customize and add metadata to fields of models.

Numeric Constraints

There are some keyword arguments that can be used to constrain numeric values:

  • gt - greater than
  • lt - less than
  • ge - greater than or equal to
  • le - less than or equal to
  • multiple_of - a multiple of the given number
  • allow_inf_nan -allow 'inf', '-inf', 'nan' values

Here's an example:

from pydantic import BaseModel, Field

class Foo(BaseModel):
    positive: int = Field(gt=0)
    non_negative: int = Field(ge=0)
    negative: int = Field(lt=0)
    non_positive: int = Field(le=0)
    even: int = Field(multiple_of=2)
    love_for_pydantic: float = Field(allow_inf_nan=True)

foo = Foo(
    positive=1,
    non_negative=0,
    negative=-1,
    non_positive=0,
    even=2,
    love_for_pydantic=float('inf'),
)
print(foo)
"""
positive=1 non_negative=0 negative=-1 non_positive=0 even=2 love_for_pydantic=inf
"""

String Constraints

There are fields that can be used to constrain strings:

  • min_length: Minimum length of the string.
  • max_length: Maximum length of the string.
  • pattern: A regular expression that the string must match.

Here's an example:

from pydantic import BaseModel, Field

class Foo(BaseModel):
    short: str = Field(min_length=3)
    long: str = Field(max_length=10)
    regex: str = Field(pattern=r'^\d*$')  

foo = Foo(short='foo', long='foobarbaz', regex='123')
print(foo)
#> short='foo' long='foobarbaz' regex='123'

Immutability

The parameter frozen is used to emulate the frozen dataclass behaviour. It is used to prevent the field from being assigned a new value after the model is created (immutability).

from pydantic import BaseModel, Field, ValidationError

class User(BaseModel):
    name: str = Field(frozen=True)
    age: int

user = User(name='John', age=42)

try:
    user.name = 'Jane'  

except ValidationError as e:
    print(e)
    """
    1 validation error for User
    name
      Field is frozen [type=frozen_field, input_value='Jane', input_type=str]
    """

Més informació a https://docs.pydantic.dev/latest/concepts/fields/

JSON

Parsing

Amb pydatic pots consumir dades JSON.

En aquest exemple, demanes que la validació sigui estricta:

Pydantic provides builtin JSON parsing, which helps achieve:

from datetime import date
from typing import Tuple
from pydantic import BaseModel, ConfigDict, ValidationError

class Event(BaseModel):
    model_config = ConfigDict(strict=True)

    when: date
    where: Tuple[int, int]

data: str = '{"when": "1987-01-28", "where": [51, -1]}'
event: Event = Event.model_validate_json(data)
assert event.where[0] == 51 

https://docs.pydantic.dev/latest/concepts/json/

TODO