NDJSON and JSON Lines
Learning Focus
Standard JSON cannot be streamed incrementally. NDJSON solves this by placing one self-contained JSON value per line — enabling terabyte-scale processing with constant memory.
What is NDJSON?
NDJSON (Newline Delimited JSON), also known as JSON Lines (.jsonl):
- Each line is a complete, independent JSON value
- Lines separated by
\n - Appendable without re-parsing the whole file
- No outer array wrapper
[ ... ]
events.ndjson
{"event": "user.signup", "userId": 1, "ts": "2024-01-15T08:00:00Z"}
{"event": "user.login", "userId": 1, "ts": "2024-01-15T09:30:00Z"}
{"event": "order.created", "userId": 2, "ts": "2024-01-15T10:00:00Z"}
Standard JSON vs NDJSON
| Aspect | JSON Array | NDJSON |
|---|---|---|
| Must read entire file | Yes | No — stream line by line |
| Appendable | No (requires rewrite) | Yes — just append a line |
| Memory for 10 GB file | ~10 GB | ~1 KB (one line at a time) |
| Primary use | Small payloads | Logs, pipelines, bulk export |
Processing with jq
# Filter login events only
jq 'select(.event == "user.login")' events.ndjson
# Count by event type
jq -r '.event' events.ndjson | sort | uniq -c
# Transform to CSV
jq -r '[.event, .userId, .ts] | @csv' events.ndjson > events.csv
Processing in Python
# Write NDJSON
with open("events.ndjson", "w") as f:
for record in records:
f.write(json.dumps(record) + "\n")
# Read NDJSON — constant memory
with open("events.ndjson", "r") as f:
for line in f:
if line.strip(): # skip blank lines
event = json.loads(line)
process(event)
DuckDB Integration
-- Direct SQL on NDJSON files
SELECT event, COUNT(*) AS count
FROM read_ndjson_auto('events.ndjson')
GROUP BY event ORDER BY count DESC;
Concept Map
Concept Flow
NDJSON File
├── Line 1: { JSON object }
├── Line 2: { JSON object }
└── Line N: { JSON object }
└── Stream / Iterator
└── One record at a time → Constant memory footprint
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
json.load() on NDJSON | Parse error — not valid JSON | Read line-by-line with json.loads() |
| Missing newline at EOF | Last record may be ignored | Ensure each line ends with \n |
| Blank lines in file | JSONDecodeError on empty string | Skip blank lines: if line.strip(): |
What's Next
- Next: Binary JSON Formats — BSON, MessagePack, and Protocol Buffers.
- Section Overview — Return to the Advanced module index.