Skip to main content

NDJSON and JSON Lines

Learning Focus

Standard JSON cannot be streamed incrementally. NDJSON solves this by placing one self-contained JSON value per line — enabling terabyte-scale processing with constant memory.

What is NDJSON?

NDJSON (Newline Delimited JSON), also known as JSON Lines (.jsonl):

  • Each line is a complete, independent JSON value
  • Lines separated by \n
  • Appendable without re-parsing the whole file
  • No outer array wrapper [ ... ]
events.ndjson
{"event": "user.signup", "userId": 1, "ts": "2024-01-15T08:00:00Z"}
{"event": "user.login", "userId": 1, "ts": "2024-01-15T09:30:00Z"}
{"event": "order.created", "userId": 2, "ts": "2024-01-15T10:00:00Z"}

Standard JSON vs NDJSON

AspectJSON ArrayNDJSON
Must read entire fileYesNo — stream line by line
AppendableNo (requires rewrite)Yes — just append a line
Memory for 10 GB file~10 GB~1 KB (one line at a time)
Primary useSmall payloadsLogs, pipelines, bulk export

Processing with jq

# Filter login events only
jq 'select(.event == "user.login")' events.ndjson

# Count by event type
jq -r '.event' events.ndjson | sort | uniq -c

# Transform to CSV
jq -r '[.event, .userId, .ts] | @csv' events.ndjson > events.csv

Processing in Python

# Write NDJSON
with open("events.ndjson", "w") as f:
for record in records:
f.write(json.dumps(record) + "\n")

# Read NDJSON — constant memory
with open("events.ndjson", "r") as f:
for line in f:
if line.strip(): # skip blank lines
event = json.loads(line)
process(event)

DuckDB Integration

-- Direct SQL on NDJSON files
SELECT event, COUNT(*) AS count
FROM read_ndjson_auto('events.ndjson')
GROUP BY event ORDER BY count DESC;

Concept Map

Concept Flow

NDJSON File
├── Line 1: { JSON object }
├── Line 2: { JSON object }
└── Line N: { JSON object }
└── Stream / Iterator
└── One record at a time → Constant memory footprint

Common Pitfalls

PitfallConsequencePrevention
json.load() on NDJSONParse error — not valid JSONRead line-by-line with json.loads()
Missing newline at EOFLast record may be ignoredEnsure each line ends with \n
Blank lines in fileJSONDecodeError on empty stringSkip blank lines: if line.strip():

What's Next