Lab_1 (Con.)

You can also try other aggregates: |Aggregate||Description| |---|---|---| |\$sum||Calculate the total| |\$avg||Find the average| |\$min||Find the minimum value| |\$max||Find the maximum value| |\$push||Add the value to a list, duplicates are allowed| |\$addToSet||Add the value to a set, duplicates will not be added| |\$first||Find the first match only| |\$last||Find the last match only| ## Exporting to JSON file Since the documents are represented as a list of the dictionaries in Python, we can convert them to JSON format and save the result to a data file. The following is the procedure to do so. First, we retrieve all documents from the collection. ```python data = list(companies.find()) ``` We use the `dumps` function provided by JSON package to convert the list of the documents to a JSON-String. ```python import json json_str = json.dumps(data) ``` Then, we open a file with the `write` mode and write the string to the file. ```python fout = open('companies.json', 'w') fout.write(json_str) fout.close() ``` ## Importing JSON file In the previous part, we learnt how to load the data from a text file in Tab Delimited Text format. Now, we are going to learn how to load the data from a JSON file. Before loading the data into MongoDB, we ensure that collection `companies` is empty. Therefore, we can the `delete_many` function to delete all documents. ```python companies.delete_many({}) data = list(companies.find()) print(data) ``` [] First, we open the data file from using the open function with the `read-only` mode ('r'), and load the line from the data file. Assume that the content contains a single line only. Hint, you may use readline() to read this line. ### Task 1: Open ("companies.json") file in correct mode ```python file = open('companies.json', 'r') ``` ```python json_str = file.readline() ``` We then use the loads function provided by JSON package to convert the data from JSON-string format to a list. ```python print(type(json_str), len(json_str)) ``` <class 'str'> 16247321 ```python import json data = json.loads(json_str) ``` Now, we can call the insert_many function to insert documents. ### Task 2: insert your data to companies ```python companies.insert_many(data) ``` <pymongo.results.InsertManyResult at 0x7faf3d352b50> Let's check whether the collection contains the new documents or not. ### Task 3: use find() method to check the contained data please display the first five results in your results ```python list(companies.find().limit(5)) ``` [{'_id': 1, 'Year': '2021', 'Industry_aggregation_NZSIOC': 'Level 1', 'Industry_code_NZSIOC': '99999', 'Industry_name_NZSIOC': 'All industries', 'Units': 'Dollars (millions)', 'Variable_code': 'H01', 'Variable_name': 'Total income', 'Variable_category': 'Financial performance', 'Value': 757504, 'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'}, {'_id': 2, 'Year': '2021', 'Industry_aggregation_NZSIOC': 'Level 1', 'Industry_code_NZSIOC': '99999', 'Industry_name_NZSIOC': 'All industries', 'Units': 'Dollars (millions)', 'Variable_code': 'H04', 'Variable_name': '"Sales, government funding, grants and subsidies"', 'Variable_category': 'Financial performance', 'Value': 674890, 'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'}, {'_id': 3, 'Year': '2021', 'Industry_aggregation_NZSIOC': 'Level 1', 'Industry_code_NZSIOC': '99999', 'Industry_name_NZSIOC': 'All industries', 'Units': 'Dollars (millions)', 'Variable_code': 'H05', 'Variable_name': '"Interest, dividends and donations"', 'Variable_category': 'Financial performance', 'Value': 49593, 'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'}, {'_id': 4, 'Year': '2021', 'Industry_aggregation_NZSIOC': 'Level 1', 'Industry_code_NZSIOC': '99999', 'Industry_name_NZSIOC': 'All industries', 'Units': 'Dollars (millions)', 'Variable_code': 'H07', 'Variable_name': 'Non-operating income', 'Variable_category': 'Financial performance', 'Value': 33020, 'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'}, {'_id': 5, 'Year': '2021', 'Industry_aggregation_NZSIOC': 'Level 1', 'Industry_code_NZSIOC': '99999', 'Industry_name_NZSIOC': 'All industries', 'Units': 'Dollars (millions)', 'Variable_code': 'H08', 'Variable_name': 'Total expenditure', 'Variable_category': 'Financial performance', 'Value': 654404, 'Industry_code_ANZSIC06': '"ANZSIC06 divisions A-S (excluding classes K6330, L6711, O7552, O760, O771, O772, S9540, S9601, S9602, and S9603)"'}]