Settings

Spark session

This library uses session_handler.State to provide universal access to the same session for all modules. Default session will be created automatically and can be accessed as a session attribute.

from replay.session_handler import State
State().session

There is also a helper function to provide basic settings for the creation of Spark session

replay.session_handler.get_spark_session(spark_memory=None, shuffle_partitions=None)

Get default SparkSession

Parameters

spark_memory (Optional[int]) – GB of memory allocated for Spark; 70% of RAM by default.
shuffle_partitions (Optional[int]) – number of partitions for Spark; triple CPU count by default

Return type

SparkSession

You can pass any Spark session to State for it to be available in library.

from replay.session_handler import get_spark_session
session = get_spark_session(2)
State(session)

class replay.session_handler.State(session=None, device=None)

All modules look for Spark session via this class. You can put your own session here.

Other parameters are stored here too: default device for pytorch (CPU/CUDA)

Logging

Logger name is replay. Default level is logging.INFO.

import logging
logger = logging.getLogger("replay")
logger.setLevel(logging.DEBUG)