{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1491d6fc",
   "metadata": {},
   "source": [
    "# Specifying Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b44c8011",
   "metadata": {},
   "source": [
    "Except for \"single\" problems, each problem usually represents a large (often, infinite) family of cases, called instances, that one may want to solve. All these instances are uniquely identified by some specific data.\n",
    "First, recall that the command to be run for generating an XCSP$^3$ instance (file), given a model and some data is:\n",
    "\n",
    "```\n",
    "python <model_file> -data=<data_values>\n",
    "````\n",
    "\n",
    "where:\n",
    "- ```<model_file>``` (is a Python file that) represents a PyCSP$^3$ model\n",
    "- ```<data_values>``` represents some specific data. \n",
    "\n",
    "In our context, an *elementary* value is a value of one of these built-in data types: integer ('int'), real ('float'), string ('str') and boolean ('bool'). Specific data can be given as:\n",
    "- a single elementary value, as in \n",
    "\n",
    "```\n",
    "-data=5\n",
    "```\n",
    "- a list of elementary values, between square (or round) brackets (according to the operating system, one might need to escape brackets) and with comma used as a separator, as in \n",
    "\n",
    "```\n",
    "-data=[9,0,0,3,9]\n",
    "```\n",
    "\n",
    "- a list of named elementary values, between square (or round) brackets and with comma used as a separator, as in\n",
    "\n",
    "```\n",
    "-data=[v=9,b=0,r=0,k=3,l=9]\n",
    "```\n",
    "\n",
    "- a JSON file (possibly given by an URL), as in \n",
    "\n",
    "```\n",
    "-data=Bibd-9-3-9.json\n",
    "```\n",
    "\n",
    "- a text file (i.e., a non-JSON file in any arbitrary format) while providing with the option ```-dataparser``` some Python code to load it, as in\n",
    "\n",
    "```\n",
    "-data=puzzle.txt -dataparser=ParserPuzzle.py\n",
    "```  \n",
    "\n",
    "\n",
    "Then, **data can be directly used in PyCSP$^3$ models by means of a predefined variable called *data***.The value of the predefined PyCSP$^3$ variable ```data``` is set as follows:\n",
    "- if the option ```-data``` is not specified, or if it is specified as ```-data=null``` or  ```-data=None```, then the value of ```data``` is *None*. See, for example, the Sudoku problem.\n",
    "- if a single elementary value is given (possibly, between brackets), then the value of ```data``` is directly this value. See, for example, the Golomb Ruler problem.\n",
    "- if a JSON file containing a root object with only one field is given, then the value of ```data``` is directly this value. \n",
    "- if a list of (at least two) elementary values is given, then the value of ```data``` is a tuple containing those values in sequence. See, for example, the Board Coloration problem.\n",
    "- if a list of (at least two) named elementary values is given, then the value of ```data``` is a named tuple. \n",
    "- if a JSON file containing a root object with at least two fields is given, then the value of $\\mathtt{data}$ is a named tuple. Actually, any encountered JSON object in the file is (recursively) converted into a named tuple. See, for example, the Warehouse and Rack Configuration problems."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bf97d52",
   "metadata": {},
   "source": [
    "For example, for the AllInterval and Bibd problems, we can write:\n",
    "```\n",
    "python AllInterval.py -data=12\n",
    "\n",
    "python Bibd.py -data=[9,0,0,3,9]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98df509d",
   "metadata": {},
   "source": [
    "## Storing Data in JSON"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ca87c78",
   "metadata": {},
   "source": [
    "Suppose that you would prefer to have a JSON file for storing these data values.\n",
    "You can execute:\n",
    "```\n",
    "python Bibd.py -data=[9,0,0,3,9] -datexport\n",
    "```\n",
    "\n",
    "You then obtain the following JSON file 'Bibd-9-0-0-3-9.json'\n",
    "```\n",
    "{\n",
    "  \"v\":9,\n",
    "  \"b\":0,\n",
    "  \"r\":0,\n",
    "  \"k\":3,\n",
    "  \"l\":9\n",
    "}\n",
    "```\n",
    "\n",
    "And now, to generate the same XCSP$^3$ instance (file) as above, you can execute:\n",
    "```\n",
    "python Bibd.py -data=Bibd-9-0-0-3-9.json\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2909541",
   "metadata": {},
   "source": [
    "## Escape Characters"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "019e19d0",
   "metadata": {},
   "source": [
    "With some command interpreters (shells), you may have to escape the characters '[' and ']', which gives:\n",
    "\n",
    "```\n",
    "python Bibd.py -data=\\[9,0,0,3,9\\]\n",
    "```\n",
    "\n",
    "You can also use round brackets instead of square brackets: \n",
    "\n",
    "```\n",
    "python Bibd.py -data=(9,0,0,3,9)\n",
    "```\n",
    "\n",
    "If it causes some problem with the command interpreter (shell), you have to escape the characters '(' and ')', which gives:\n",
    "\n",
    "```\n",
    "python Bibd.py -data=\\(9,0,0,3,9\\)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6604e565",
   "metadata": {},
   "source": [
    "**Remark.**  At the Windows command line, different escape characters may be needed (for example, depending whether you use Windows Powershell or not). However, note that you can always run a command from a batch script file (or use a JSON file). "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95f18939",
   "metadata": {},
   "source": [
    "## Filenames with Formatted Data "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "694b194f",
   "metadata": {},
   "source": [
    "As shown above, when data are given under the form of elementary values on the command line, they are integrated in the filename of the generated instance. However, sometimes, it may be interesting to format a little bit such filenames. This is possible by using the format ```-dataformat```. The principle is that the string passed to this option will serve to apply formatting to the values in ```-data```. For example,\n",
    "```\n",
    "python Bibd.py -data=[9,0,0,3,9] -dataformat={:02d}-{:01d}-{:01d}-{:02d}-{:02d}\n",
    "```\n",
    "\n",
    "will generate an XCSP$^3$ file with filename 'Bibd-09-0-0-03-09.xml'\n",
    "\n",
    "If the same pattern must be applied to all pieces of data, we can write:\n",
    "```\n",
    "python Bibd.py -data=[9,0,0,3,9] -dataformat={:02d}\n",
    "```\n",
    "\n",
    "so as to obtain an XCSP$^3$ file with filename 'Bibd-09-00-00-03-09.xml'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cc54830",
   "metadata": {},
   "source": [
    "## About Using Tuple Unpacking on Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18b02af2",
   "metadata": {},
   "source": [
    "For the BACP problem, an example of data is given by the following JSON file, called 'Bacp\\_example.json':\n",
    "```\n",
    "{\n",
    "  \"nPeriods\": 4,\n",
    "  \"minCredits\": 2,\n",
    "  \"maxCredits\": 5,\n",
    "  \"minCourses\": 2,\n",
    "  \"maxCourses\": 3,\n",
    "  \"credits\": [2,3,1,3,2,3,3,2,1],\n",
    "  \"prequisites\": [[2,0],[4,1],[5,2],[6,4]]\n",
    "}\n",
    "```\n",
    "\n",
    "In the BACP model, in a file called 'Bacp.py', it is then possible to use tuple unpacking, and to get all important data in only one statement:\n",
    "```\n",
    "nPeriods, minCredits, maxCredits, minCourses, maxCourses, credits, prereq = data\n",
    "```\n",
    "\n",
    "The command to execute for compiling is then:\n",
    "```\n",
    "python Bacp.py -data=Bacp_example.json\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27e4e5aa",
   "metadata": {},
   "source": [
    "Because tuple unpacking is used, it is important to note that the fields of the root object in the JSON file must be given in this exact order. If it is not the case, as for example:\n",
    "```\n",
    "{\n",
    "  \"nPeriods\": 4,\n",
    "  \"prequisites\": [[2,0],[4,1],[5,2],[6,4]],\n",
    "  \"minCredits\": 2,\n",
    "  \"maxCredits\": 5,\n",
    "  \"credits\": [2,3,1,3,2,3,3,2,1],\n",
    "  \"minCourses\": 2,\n",
    "  \"maxCourses\": 3\n",
    "}\n",
    "```\n",
    "\n",
    "there will be a problem when unpacking data.\n",
    "If you wish a safer model (because, for example, you have no guarantee about the way the data are generated), you must specifically refer to the fields of the named tuple instead: \n",
    "```\n",
    "nPeriods = data.nPeriods\n",
    "minCredits, maxCredits = data.minCredits, data.maxCredits\n",
    "minCourses, maxCourses = data.minCourses, data.maxCourses\n",
    "credits, prereq = data.credits, data.prerequisites\n",
    "nCourses = len(credits)\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76b44a1c",
   "metadata": {},
   "source": [
    "## About using a Data Parser"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1dde718",
   "metadata": {},
   "source": [
    "Now, let us suppose that you would like to use the data from this MiniZinc file 'bacp-data.mzn':\n",
    "``` \n",
    "include \"curriculum.mzn.model\";\n",
    "n_courses = 9;\n",
    "n_periods = 4;\n",
    "load_per_period_lb = 2;\n",
    "load_per_period_ub = 5;\n",
    "courses_per_period_lb = 2;\n",
    "courses_per_period_ub = 3;\n",
    "course_load = [2, 3, 1, 3, 2, 3, 3, 2,1, ];\n",
    "constraint prerequisite(2, 0);\n",
    "constraint prerequisite(4, 1);\n",
    "constraint prerequisite(5, 2);\n",
    "constraint prerequisite(6, 4);\n",
    "```\n",
    "\n",
    "We need to write a piece of code in Python for building the variable *data* that will used in our model.\n",
    "After importing everything from *pycsp3.problems.data.parsing*, we can use some PyCSP$^3$ functions such as *next\\_line()*, *number\\_in()*, *remaining\\_lines}()*,... Here, we also use the classical function *split()* of module *re* to parse information concerning prerequisites. Note that you have to add relevant fields to the predefined dictionary *data* (because at this stage, *data* is a dictionary even if later, it will be automatically converted to a named tuple), as in the following file 'Bacp\\_ParserZ.py':\n",
    "```\n",
    "from pycsp3.problems.data.parsing import *\n",
    "\n",
    "nCourses = number_in(next_line())\n",
    "data[\"nPeriods\"] = number_in(next_line())\n",
    "data[\"minCredits\"] = number_in(next_line())\n",
    "data[\"maxCredits\"] = number_in(next_line())\n",
    "data[\"minCourses\"] = number_in(next_line())\n",
    "data[\"maxCourses\"] = number_in(next_line())\n",
    "data[\"credits\"] = numbers_in(next_line())\n",
    "data[\"prerequisites\"] = [[int(v) - 1\n",
    "    for v in re.split(r'constraint prerequisite\\(|,|\\);', line) if len(v) > 0]\n",
    "      for line in remaining_lines(skip_curr=True)]\n",
    "```\n",
    "\n",
    "To generate the XCSP$^3$ instance (file), you have to execute:\n",
    "```\n",
    "python Bacp.py -data=bacp.mzn -dataparser=Bacp_ParserZ.py\n",
    "```\n",
    "\n",
    "If you want the same data put in a JSON file, execute:\n",
    "```\n",
    "python Bacp.py -data=bacp-data.mzn -dataparser=Bacp_ParserZ.py -dataexport\n",
    "```\n",
    "\n",
    "You obtain a file called 'bacp-data.json' equivalent to the one introduced earlier.\n",
    "If you want to specify the name of the output JSON file, give it as a value to the option ```-dataexport```, as e.g., in:\n",
    "```\n",
    "python Bacp.py -data=bacp-data.mzn -dataparser=Bacp_ParserZ.py -dataexport=instance0\n",
    "```\n",
    "\n",
    "The generated JSON file is then called 'instance0.json'."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48427f98",
   "metadata": {},
   "source": [
    "## Special Rules when Loading JSON Files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e17a304d",
   "metadata": {},
   "source": [
    "The rules that are used when loading a JSON file in order to set the value of the PyCSP$^3$ predefined variable *data* are as follows.  \n",
    "- For any field $f$ of the root object in the JSON file, we obtain a field f in the generated named tuple *data* such that:\n",
    "  - if f is a JSON list (or recursively, a list of lists) containing only integers, the type of *data.f* is 'pycsp3.tools.curser.ListInt' instead of 'list'; 'ListInt' being a subclass of 'list'. The main interest is that *data.f* can be directly used as a vector for the global constraint *Element*.\n",
    "  See Mario Problem for an illustration.\n",
    "  - if f is an object, *data.f* is a named tuple with the same fields as f.\n",
    "  See Rack Configuration Problem for an illustration.\n",
    "- The rules above apply recursively."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a712f7d1",
   "metadata": {},
   "source": [
    "## Special Rule when Building Arrays of Variables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d228a30f",
   "metadata": {},
   "source": [
    "When we define a list (array) $x$ of variables with *VarArray}()*, the type of $x$ is 'pycsp3.tools.curser.ListVar' instead of 'list'. The main interest is that $x$ can be directly used as a vector for the global constraint *Element*.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1229643",
   "metadata": {},
   "source": [
    "## Special Values *null* and *None*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a28b8ee",
   "metadata": {},
   "source": [
    "When the value *null* occurs in a JSON file, it becomes *None* in PyCSP$^3$ after loading the data file."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67dd02b0",
   "metadata": {},
   "source": [
    "## Loading Several JSON Files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0f2db40",
   "metadata": {},
   "source": [
    "It is possible to load data fom several JSON files. It suffices to indicate a list of JSON filenames between brackets. For example, let 'file1.json' be:\n",
    "```\n",
    "{\n",
    "  \"a\": 4,\n",
    "  \"b\": 12\n",
    "}\n",
    "```\n",
    "\n",
    "let 'file2.json' be:\n",
    "```\n",
    "{\n",
    "  \"c\": 10,\n",
    "  \"d\": 1\n",
    "}\n",
    "```\n",
    "\n",
    "and let 'Test.py' be:\n",
    "```\n",
    "from pycsp3 import *\n",
    "\n",
    "a, b, c, d = data\n",
    "\n",
    "print(a, b, c, d)\n",
    "\n",
    "...\n",
    "```\n",
    "\n",
    "then, by executing: \n",
    "```\n",
    "python Test.py -data=[file1.json,file2.json]\n",
    "```\n",
    "\n",
    "we obtain the expected values in the four Python variables, because the order of fields is guaranteed (as if the two JSON files haved been concatenated); behind the scene, and OrderedDict is used, and the method *update()* is called."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e457683f",
   "metadata": {},
   "source": [
    "## Combining JSON Files and Named Elementary Values"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49a4fd22",
   "metadata": {},
   "source": [
    "It may be useful to load data from JSON files, while updating some (named) elementary values. It means that we can indicate between brackets JSON filenames as well as named elementary values. The rule is simple: any field of the variable *data* is given as value the last statement concerning it when loading. For example, the command:\n",
    "```\n",
    "python Test.py -data=[file1.json,file2.json,c=5]\n",
    "```\n",
    "\n",
    "defines the variable *data* from the two JSON files, except that the variable c is set to 5.\n",
    "However, the command:\n",
    "```\n",
    "python Test.py -data=[c=5,file1.json,file2.json]\n",
    "```\n",
    "is not appropriate because the value of c will be overriden when considering 'file2.json'.\n",
    "\n",
    "Just remember that named elementary values must be given after JSON files."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "965400de",
   "metadata": {},
   "source": [
    "## Loading Several Text Files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "039db62d",
   "metadata": {},
   "source": [
    "It is also possible to load data fom several text (non-JSON) files. It suffices to indicate a list of filenames between brackets, which then will be concatenated just before soliciting an appropriate parser.\n",
    "For example, let 'file1.txt' be:\n",
    "```\n",
    "5\n",
    "2 4 12 3 8  \n",
    "```\n",
    "\n",
    "let 'file2.txt' be:\n",
    "``` \n",
    "3 3\n",
    "0 1 1\n",
    "1 0 1\n",
    "0 0 1\n",
    "```\n",
    "\n",
    "then, at time the file 'Test2_Parser.py' is executed after typing:\n",
    "```\n",
    "python Test2.py -data=[file1.txt,file2.txt] -dataparser=Test2_Parser.py\n",
    "```\n",
    "\n",
    "we can read a sequence of text lines as if a single file was initially given with content:\n",
    "```\n",
    "5\n",
    "2 4 12 3 8  \n",
    "3 3\n",
    "0 1 1\n",
    "1 0 1\n",
    "0 0 1\n",
    "```\n",
    "\n",
    "It is even possible to add arbitrary lines to the intermediate concatenated file. For example, \n",
    "```\n",
    "python Test2.py -data=[file1.txt,file2.txt,10] -dataparser=Test2_Parser.py\n",
    "```\n",
    "\n",
    "adds a last line containing the value 10. Because whitespace are not tolerated, one may need to surround additional lines with quotes (or double quotes).\n",
    "For example, at time 'Test2_Parser.py' is executed after typing:\n",
    "```\n",
    "python Test2.py -data=[file1.txt,file2.txt,10,\"3 5\",partial] -dataparser=Test2_Parser.py\n",
    "```\n",
    "\n",
    "the sequence of text lines is as follows:\n",
    "```\n",
    "5\n",
    "2 4 12 3 8  \n",
    "3 3\n",
    "0 1 1\n",
    "1 0 1\n",
    "0 0 1\n",
    "10\n",
    "3 5\n",
    "partial\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4df93fa",
   "metadata": {},
   "source": [
    "## Default Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "233db03a",
   "metadata": {},
   "source": [
    "Except for single problems, data must be specified by the user in order to generate specific problem instances.\n",
    "If data are not specified, an error is raised. However, when writting the model, it is always possible to indicate some default data, notably by using the bahaviour of the Python operator *or*. For setting a JSON file as being the default data file, we must call the function *default_data()*. Handling default data is illustrated with BIBD and BACP problems.\n",
    "\n",
    "For BIBD, If we replace:\n",
    "```\n",
    "v, b, r, k, l = data \n",
    "```\n",
    "\n",
    "by\n",
    "```\n",
    "v, b, r, k, l = data or (9,0,0,3,9)\n",
    "```\n",
    "\n",
    "then, we can generate the default instance with:\n",
    "```\n",
    "python Bibd.py\n",
    "```\n",
    "\n",
    "For BACP, if we replace:\n",
    "```\n",
    "nPeriods, minCredits, maxCredits, minCourses, maxCourses, credits, prereq = data\n",
    "```\n",
    "\n",
    "by\n",
    "```\n",
    "nPeriods, minCredits, maxCredits, minCourses, maxCourses, credits, prereq = data or default_data(Bacp_example.json)\n",
    "```\n",
    "\n",
    "then, we can generate the default instance with:\n",
    "```\n",
    "python Bacp.py\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "661a3c8b",
   "metadata": {},
   "source": [
    "## Loading a JSON Data File"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79304963",
   "metadata": {},
   "source": [
    "If for some reasons, it is convenient to load some data independently of the option ```-data```, on can call the function *load_json_data()*. This function accepts a parameter that is the filename of a JSON file (possibly given by an URL), and returns a named tuple containing loaded data.  "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}